Hi, On 18/03/13 12:20 PM, Adrian Pelin wrote: > Hello, > > It seems like the answer I got from the velvet mailing list for this issue is > that there is no solution. > Is there a strategy I could use use with Ray to avoid getting the following > issue?: > > My organism seems to be full of SNPs in a perfect 50/50 ratio which is > probably due to it being diploid. My expirience with assembling velvet > data is that it generates multiple contigs with very high nucleotide > identity between some contigs. The only diffrences are SNPs. > > I was wondering, is there any way to assemble only the haploid genome > for a start? I am afraid to overestimate the haploid genome size. Also, > velvet doesn't generate identical contigs for each piece of sequence, > just in some cases there are giant contigs over a few kb overlapping. > > Any strategy to avoid this or remove these from assembly? My data is > MiSeq fragments 300bp and hiseq mate pair jumping lib 3kb. >
I happen to be working on exactly this problem in Ray today (I have been working on that for a few weeks now). See these two tickets: * https://github.com/sebhtml/ray/issues/136 * https://github.com/sebhtml/ray/issues/153 The thing is that in a de Bruijn graph (such as the one in Velvet or Ray), a variation of one nucleotide leads to alternate branches containing k vertices. A typical SNP in a de Bruijn graph (in Ray Cloud Browser): => http://genome.ulaval.ca:10111/client/?map=0§ion=0®ion=1&location=132207&zoom=1.191270483217418 From an algorithm point of view, if you use a large k-mer length, assemblers will spawn contigs for each allele because each branch will be "good enough". Therefore, some of these assembly seeds need to be filtered out. As far as I know, all de Bruijn assemblers have this problem right now with large kmers. The two issues above should be fixed this week by this new plugin in Ray: => https://github.com/sebhtml/ray/tree/master/code/SpuriousSeedAnnihilator As its name suggests, SpuriousSeedAnnihilator will annilihate spurious seeds which otherwise will lead to duplicated genetic regions. -Séb > Adrian > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > Denovoassembler-users mailing list > Denovoassembler-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users > ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users