Hi Sebastien and denovoassemblers, I recently started experimenting
with Ray, it seems pretty cool and orders of magnitude more memory
efficient than other assemblers I have used in the past. Unfortunately
the results I am getting are a bit odd. I am working on assembling a
bacterial genome, and have assembled some large (max size = 600kb)
contigs using 454SE reads and Newbler. I have ~20x coverage, and
optical mapping results (basically a fancy restriction digest map)
from the same genome suggest that my large contigs are quite good
(~85% of the genome is covered by the large contigs and there is no
evidence of misassembly). I suspect that gaps remain in my assembly
due to small repetitive elements, and in fact a coverage analysis
suggests that some contigs have as many as 8 copies in the genome! The
documentation for Ray suggests that it does quite well with resolving
these repetitive elements so I was excited to try it on my data.

I am using Ray 1.7.0.

Initially I tried to run Ray on the assembled contigs only to see if
it would deal with any of the overlaps identified through optical
mapping.

The assembled contigs have these stats:
                numberOfContigs   = 85;
                numberOfBases     = 6302194;

                avgContigSize     = 74143;
                N50ContigSize     = 133881;
                largestContigSize = 613481;

                Q40PlusBases      = 6290676, 99.82%;
                Q39MinusBases     = 11518, 0.18%;



This attempt quickly generated an assembler panic:

$ Ray -s AllContigs.fasta -out test1
------------------------------------------------------------------------------------
***
Step: K-mer counting
Date: Fri Jan 27 10:07:32 2012
Elapsed time: 5 seconds
Since beginning: 6 seconds
***


Rank 0 has 16364 k-mers (completed)


Rank 0: the minimum coverage is 2
Rank 0: the peak coverage is 2
Rank 0: Assembler panic: no peak observed in the k-mer coverage distribution.
Rank 0: to deal with the sequencing error rate, try to lower the k-mer
length (-k)
Rank 0: sent 10431 messages, received 10430 messages.
------------------------------------------------------------------------------------

I then tried adding the original set of 454 reads as well as the contigs:
Ray -s AllContigs.fasta -s Sample1_Reads.fasta -o test2

This allowed Ray to run, however the results are not quite what I expected.
OutputNumbers.txt:
Contigs >= 100 nt
 Number: 2847
 Total length: 6101858
 Average: 2143
 N50: 4105
 Median: 1236
 Largest: 29655
Contigs >= 500 nt
 Number: 2057
 Total length: 5896767
 Average: 2866
 N50: 4349
 Median: 1876
 Largest: 29655
Scaffolds >= 100 nt
 Number: 2847
 Total length: 6101858
 Average: 2143
 N50: 4105
 Median: 1236
 Largest: 29655
Scaffolds >= 500 nt
 Number: 2057
 Total length: 5896767
 Average: 2866
 N50: 4349
 Median: 1876
 Largest: 29655

So, I'm not exactly using Ray for the designed purpose, but I am
curious about why it is breaking apart my large contigs and producing
an assembly with less assembled bases than I originally fed it. Any
suggestions for having Ray deal with assembly of the repetitive
regions without breaking up these large contigs would be most welcome!

thanks,

Sam

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to