On Thu, Jun 5, 2008 at 8:55 PM, Paul Leo <[EMAIL PROTECTED]> wrote: > I've been reading with great interest since Bioc-sig-seq started, we finally > have a Solexa GA2... but still in the box so I have no hands on experience > yet... > Must admit I've not sorted out in my head what parts of the analysis are most > suited R (my preference is as much as possible by but I understand some of > the limitations...) > Anyway the new hardware comes with a cluster of PCs (IPAR) that does the > image analysis and they promise will do the base calling soon (not alignment?) >
Hi, Paul. You might want to look at http://seqanswers.com/ for some ideas. That said, I'll give you my $0.02 worth. There are several good options for doing alignments including R/Biostrings, MAQ, ELAND, SOAP, and others.... > Given this, and that we like using R when possible, and that we have some > modest resources, can someone comment between the suggested server for the > GA2 system: quad Xeon 7000 series with 32GB Ram verses a quad Xeon box with > the 5000 series processor (3 GHz). We have some experience setting up the > 5000 series box which is about ½ the cost of the 7000 series. We do have > High Performance computing at an off-site location but transferring LARGE > files to that location may be an issue...so would like to do as much in-house > as possible. > If you have IPAR, the total file size will not include the images, so you may be able to go the off-site route. > Do you find that the analysis you have done benefits greatly from the faster > processor or would 2x 5000 boxes by better option in your opinion? Any > comments on the amount of memory per processor, I was thinking that perhaps > this should be higher than recommended if we like to use R? These things I > need to resolve before I use the sequencer unfortunately. I see in some posts > that people are using machines with 64GB of memory, is that typical? > We deal with 32GB on our 8-processor machines. Machine is not limiting for any part of the Solexa pipeline and running any of the alignment algorithms mentioned above (except perhaps Biostrings) can be done pretty easily on the human genome on all eight processors simultaneously with only 32GB of RAM. That said, memory is "relatively" cheap, so if you go with 32 GB up front, you may want to configure that to allow future expansion or just go with 64GB to start. > Also if anyone with real word experience can comment on the typical size of > the alignment file (for paired end reads on a good day), that is the > s_N_export.txt file generated by ELAND and the s*_sequences.txt file > generated by GERALD that would be helpful too (I have the standard product > info). Are there other files generated by the pipeline that you have found > particularly useful in downstream analysis or that are useful in other 3rd > party applications that you have tried? > If you are asking about disk space, think pretty big, and think expandable if you can. The Short Read Archive (SRA) at NCBI is accepting submissions from solexa. Basically, the entire Bustard directory is needed for this, so think about saving at least the bustard and GERALD (or equivalent) directories; ideally you could save the firecrest directory as well. Sean _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
