Hi Sebastien and denovoassemblers, I recently started experimenting with Ray, it seems pretty cool and orders of magnitude more memory efficient than other assemblers I have used in the past. Unfortunately the results I am getting are a bit odd. I am working on assembling a bacterial genome, and have assembled some large (max size = 600kb) contigs using 454SE reads and Newbler. I have ~20x coverage, and optical mapping results (basically a fancy restriction digest map) from the same genome suggest that my large contigs are quite good (~85% of the genome is covered by the large contigs and there is no evidence of misassembly). I suspect that gaps remain in my assembly due to small repetitive elements, and in fact a coverage analysis suggests that some contigs have as many as 8 copies in the genome! The documentation for Ray suggests that it does quite well with resolving these repetitive elements so I was excited to try it on my data.
I am using Ray 1.7.0. Initially I tried to run Ray on the assembled contigs only to see if it would deal with any of the overlaps identified through optical mapping. The assembled contigs have these stats: numberOfContigs = 85; numberOfBases = 6302194; avgContigSize = 74143; N50ContigSize = 133881; largestContigSize = 613481; Q40PlusBases = 6290676, 99.82%; Q39MinusBases = 11518, 0.18%; This attempt quickly generated an assembler panic: $ Ray -s AllContigs.fasta -out test1 ------------------------------------------------------------------------------------ *** Step: K-mer counting Date: Fri Jan 27 10:07:32 2012 Elapsed time: 5 seconds Since beginning: 6 seconds *** Rank 0 has 16364 k-mers (completed) Rank 0: the minimum coverage is 2 Rank 0: the peak coverage is 2 Rank 0: Assembler panic: no peak observed in the k-mer coverage distribution. Rank 0: to deal with the sequencing error rate, try to lower the k-mer length (-k) Rank 0: sent 10431 messages, received 10430 messages. ------------------------------------------------------------------------------------ I then tried adding the original set of 454 reads as well as the contigs: Ray -s AllContigs.fasta -s Sample1_Reads.fasta -o test2 This allowed Ray to run, however the results are not quite what I expected. OutputNumbers.txt: Contigs >= 100 nt Number: 2847 Total length: 6101858 Average: 2143 N50: 4105 Median: 1236 Largest: 29655 Contigs >= 500 nt Number: 2057 Total length: 5896767 Average: 2866 N50: 4349 Median: 1876 Largest: 29655 Scaffolds >= 100 nt Number: 2847 Total length: 6101858 Average: 2143 N50: 4105 Median: 1236 Largest: 29655 Scaffolds >= 500 nt Number: 2057 Total length: 5896767 Average: 2866 N50: 4349 Median: 1876 Largest: 29655 So, I'm not exactly using Ray for the designed purpose, but I am curious about why it is breaking apart my large contigs and producing an assembly with less assembled bases than I originally fed it. Any suggestions for having Ray deal with assembly of the repetitive regions without breaking up these large contigs would be most welcome! thanks, Sam ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users