Bonjour, Lionel Frangeul a écrit : > Hi, > > I try to assemble small viral genomes from cell cultures with Illumina reads.
> Sometimes the complete genome of the host is not available and > I try to assemble all the reads (host + virus) to obtain my contigs. > In this case I have 2 important problems for all the NGS assemblers : > 1) The coverage of my viral contigs are much more higher than the hundred > thousand > contigs of the host (300X to 8000X compar to 3X-30X for the host). The default maximum coverage of Ray is 4294967295 so this should not be a problem. There is also an option to ignore anything with too much coverage. -use-maximum-seed-coverage Ignores any seed with a coverage depth above this threshold. The default is 4294967295. This was added recently and is not in the v2.0.0. > 2) The coverage of the viral genome are also very heterogeneous (don't know > why). Maybe biases in library preparation. Did you use Nextera ? > > In the example enclosed (see fig) I obtain 4 contigs of my virus with Ray2.0 > assembly (k30 = best results, try k21 and k35). If you provided 30, Ray used 29 as Ray can not assemble anything with a even k-mer value. For k=35, did you recompile with MAXKMERLENGTH=64 ? If you did not, k=31 was used. > Very good results compare to other assemblers without any suppl options to > configure. Thank you ! > > but I don't know why these contigs are not group in only one contig. > When I search the location of these contigs by blastn on my viral genomes, I > see that > 3 of these contigs overlap on 53 and 87 bases with 100 % identity and there > are many reads over all the junctions. > Interesting, did you use paired reads ? If so, what kind ? (see RayOutput/LibraryStatistics.txt) > Do you have an idea of the cause of this fragmentation ? > Your figure shows uneven coverage across the genome. The new algorithms in Ray (called Ray Méta) work locally in the de Bruijn graph to infer coverage distribution. For what I see, it may be due to the local coverage distributions being too different for each of the fragment (contig) in your assembly. p.s. Très belle figure ! Sébastien Boisvert > TIA > > P.S. : the genome start and end by repeats. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users