Hello everyone,

How are you doing? I hope that everything is working out great for you ^_^.

Anyhow, I'm emailing you because I have a Subversion / R / HTS related question. A few of us (4 right now) in my lab want to analyze some Illumina GAIIx data and the idea is to use R as the backbone. We want to keep all the results in .Rdata format and kind of build an "internal" package so that the biologists at the lab could then load the tables easily. Kind of what Patrick Aboyoun told us at BioC2009. So, we want:

A) Major Script
This one will call the individual scripts that do a step on the workflow. It will help us remember what we did and in what order. Actually, a .Rnw vignette file would be much better.
B) Individual Scripts
These will have code but no function definitions. For example, on one of these you could call the "aligner" through a function, then read the results, find the read coverage per base, make a plot. Kind of analysis modules.
C) Package
There we'll define all the functions that we'll be called by the "individual" scripts, examples, documentation for the functions, etc. Also, we'll save the results from the scripts as R objects; most likely, data frames. Some might be large (10mb?).

The Illumina data and some big files like the alignments will not be kept on the package.

The idea is that someone or a small team will develop individual scripts, but the package and the major script will be edited by everyone participating. Now, I think that using Subversion is the way to go. However, I'm puzzled at what SVN hosting service we should use... We are not building open source software; it's more like a data package -- VJ Carey talked about them at BioC2009. Eventually it would be great to share the package, but for some months it will all be a work in progress meant to be seen only by those in the lab/project. On a bad scenario the package would never make it out of the lab.

I'm not aware if there is a public SVN hosting service that meets our needs. I guess that we could use Google Code or Rforge (just to mention a few) and not distribute the url for those "lab-only" months -- anyone could find randomly find it. Or should we hire one of the commercial SVN hosting services to keep the work private? (check http://www.svnhostingcomparison.com/ ) Hosting it at a local server is a problem for us since they are quite restrictive and svn checkouts/commits would most likely be blocked. They've had bad luck with exterior attacks on the servers.

Otherwise I think that all the people involved could use the same server user and use SVN only at the server. Something very similar to using SVN on your laptop with 2 directories: the checkout one and the "repository" one (check http://www.guyrutenberg.com/2007/10/29/creating-local-svn-repository-home-repository/ ).


As you can notice, I'm quite the newbie on SVN and working collaboratively with Illumina GA data. Any tips are more than welcome :) I also asked on SEQanswers: http://seqanswers.com/forums/showthread.php?t=4071

Thank you and greetings,
Leonardo

--
Leonardo Collado Torres, Bachelor in Genomic Sciences
Member of Dr. Enrique Morett's lab and Winter Genomics
UNAM Campus Cuernavaca, Mexico

Homepage: http://www.lcg.unam.mx/~lcollado/
Phone: [52] (777) 313-28-05

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to