Hi,

On 18/03/13 12:20 PM, Adrian Pelin wrote:
> Hello,
>
> It seems like the answer I got from the velvet mailing list for this issue is 
> that there is no solution.
> Is there a strategy I could use use with Ray to avoid getting the following 
> issue?:
>
> My organism seems to be full of SNPs in a perfect 50/50 ratio which is
> probably due to it being diploid. My expirience with assembling velvet
> data is that it generates multiple contigs with very high nucleotide
> identity between some contigs. The only diffrences are SNPs.
>
> I was wondering, is there any way to assemble only the haploid genome
> for a start? I am afraid to overestimate the haploid genome size. Also,
> velvet doesn't generate identical contigs for each piece of sequence,
> just in some cases there are giant contigs over a few kb overlapping.
>
> Any strategy to avoid this or remove these from assembly? My data is
> MiSeq fragments 300bp and hiseq mate pair jumping lib 3kb.
>

I happen to be working on exactly this problem in Ray today
(I have been working on that for a few weeks now).


See these two tickets:

* https://github.com/sebhtml/ray/issues/136

* https://github.com/sebhtml/ray/issues/153


The thing is that in a de Bruijn graph (such as the one in Velvet or Ray), a 
variation of one nucleotide
leads to alternate branches containing k vertices.


A typical SNP in a de Bruijn graph (in Ray Cloud Browser):

      => 
http://genome.ulaval.ca:10111/client/?map=0&section=0&region=1&location=132207&zoom=1.191270483217418



 From an algorithm point of view, if you use a large k-mer length, assemblers 
will spawn contigs for each
allele because each branch will be "good enough".

Therefore, some of these assembly seeds need to be filtered out. As far as I 
know, all de Bruijn assemblers have
this problem right now with large kmers.



The two issues above should be fixed this week by this new plugin in Ray:

    => https://github.com/sebhtml/ray/tree/master/code/SpuriousSeedAnnihilator

As its name suggests, SpuriousSeedAnnihilator will annilihate spurious seeds 
which otherwise will lead
to duplicated genetic regions.

-Séb

> Adrian
>
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Denovoassembler-users mailing list
> Denovoassembler-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to