On Tue, Feb 16, 2010 at 8:55 PM, Leonardo Collado Torres <[email protected]> wrote: > Hello everyone, > > How are you doing? I hope that everything is working out great for you ^_^. > > Anyhow, I'm emailing you because I have a Subversion / R / HTS related > question. A few of us (4 right now) in my lab want to analyze some Illumina > GAIIx data and the idea is to use R as the backbone. We want to keep all the > results in .Rdata format and kind of build an "internal" package so that the > biologists at the lab could then load the tables easily. Kind of what > Patrick Aboyoun told us at BioC2009. So, we want: > > A) Major Script > This one will call the individual scripts that do a step on the workflow. It > will help us remember what we did and in what order. Actually, a .Rnw > vignette file would be much better.
Yep, a vignette is a good way to go. If you are doing time-consuming things, be sure to check out the weaver package. > B) Individual Scripts > These will have code but no function definitions. For example, on one of > these you could call the "aligner" through a function, then read the > results, find the read coverage per base, make a plot. Kind of analysis > modules. There is also no real reason these scripts could not be in another package; for example, you could have one package for each "project". Using a package allows one to document objects and code as well as to specify dependencies formally (a script might depend on several other packages, for example). > C) Package > There we'll define all the functions that we'll be called by the > "individual" scripts, examples, documentation for the functions, etc. Also, > we'll save the results from the scripts as R objects; most likely, data > frames. Some might be large (10mb?). > > The Illumina data and some big files like the alignments will not be kept on > the package. > > The idea is that someone or a small team will develop individual scripts, > but the package and the major script will be edited by everyone > participating. Now, I think that using Subversion is the way to go. However, > I'm puzzled at what SVN hosting service we should use... We are not building > open source software; it's more like a data package -- VJ Carey talked about > them at BioC2009. Eventually it would be great to share the package, but for > some months it will all be a work in progress meant to be seen only by those > in the lab/project. On a bad scenario the package would never make it out of > the lab. > > I'm not aware if there is a public SVN hosting service that meets our needs. > I guess that we could use Google Code or Rforge (just to mention a few) and > not distribute the url for those "lab-only" months -- anyone could find > randomly find it. Or should we hire one of the commercial SVN hosting > services to keep the work private? (check > http://www.svnhostingcomparison.com/ ) Hosting it at a local server is a > problem for us since they are quite restrictive and svn checkouts/commits > would most likely be blocked. They've had bad luck with exterior attacks on > the servers. > > Otherwise I think that all the people involved could use the same server > user and use SVN only at the server. Something very similar to using SVN on > your laptop with 2 directories: the checkout one and the "repository" one > (check > http://www.guyrutenberg.com/2007/10/29/creating-local-svn-repository-home-repository/ > ). > > > As you can notice, I'm quite the newbie on SVN and working collaboratively > with Illumina GA data. Any tips are more than welcome :) I also asked on > SEQanswers: http://seqanswers.com/forums/showthread.php?t=4071 Hi, Leonardo. SVN is not too hard to set up, but you will probably want to set it up behind apache. However, you might consider others as well. http://en.wikipedia.org/wiki/Revision_control The main discussion point, in my opinion, is whether to use a distributed system (git, bazaar, mercurial) or a centralized system like svn. I actually prefer the distributed system (I use git) over svn, but that is just personal preference. Because much work is done with svn, I interface with svn using git-svn (so that even my interactions with the bioconductor svn server are via git). No matter what system you go with, make sure that it is well backed up! Sean > Thank you and greetings, > Leonardo > > -- > Leonardo Collado Torres, Bachelor in Genomic Sciences > Member of Dr. Enrique Morett's lab and Winter Genomics > UNAM Campus Cuernavaca, Mexico > > Homepage: http://www.lcg.unam.mx/~lcollado/ > Phone: [52] (777) 313-28-05 > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
