Very informative. Thanks.





On 3/6/13 10:31 AM, "Sébastien Boisvert" <[email protected]>
wrote:

>Hello,
>
>[Please CC the user mailing list]
>
>On 03/05/2013 06:50 PM, Lo, Chien-Chi wrote:
>> Hi Sébastien,
>>
>> I have a question about how Ray weight on the paired end reads. I ran a
>> Ecoli Miseq paired end data using Ray 2.1.0 twice.
>> First, I treat all
>> reads are single-ended and secondly I ran Ray in paired-end mode. The
>> commands I used are pasted below. The result contigs' number is very
>> different. Why paired-end mode generate better result?
>
>Let me tell you a short story.
>
>With your DNA reads, Ray builds a graph. The graph is distributed
>uniformily onto all
>the MPI ranks.
>
>Until recently, it was impossible to see such a graph with energy.
>
>Here's such a graph for Escherichia coli DH10B sequenced on a Illumina(R)
>MiSeq(R) 2x250 and assembled with Ray using a k-mer length of 91.
>
>     
>http://browser.cloud.raytrek.com/client/?map=0&section=3&region=4&location
>=178944
>
>
>Once this graph is built, the Ray MPI ranks will begin an extraordinary
>journey in which they will
>collectively, as a tribe, perform parallel graph traversals using
>heuristics. This journey is
>extraordinary because of the sheer amount of messages that are passed
>between MPI ranks -- the
>computation granularity is in the order of 10-70 microseconds.
>
>The heuristics in Ray use paired reads, mate pairs, and single-end read
>threading to perform
>the graph traversals. So Ray uses paired information for other purposes
>than just scaffolding.
>This is also the case with some other assemblers too, like ABySS.
>
>
>These original Ray heuristics are described in this open-access
>publication:
>
>     http://online.liebertpub.com/doi/abs/10.1089/cmb.2009.0238
>
>
>Recently, our group have generalized these heuristics and other parts of
>the algorithms to handle
>mixes of genomes. The new algorithms work well on bacterial genomes, on
>metagenomes, and also on
>transcriptomes. However, the algorithms probably do not handle
>alternative splicing very well.
>
>These generalizations are described in this open-access publication:
>
>     http://genomebiology.com/2012/13/12/R122/abstract
>
>
>> Thanks!
>>
>> For single-ended,
>>    Contigs >= 500 nt
>>    Number: 583
>>    Total length: 4357532
>>    Average: 7474
>>    N50: 10494
>>    Median: 5905
>>    Largest: 38443
>> For paired-ended,
>>    Contigs >= 500 nt
>>    Number: 87
>>    Total length: 4603370
>>    Average: 52912
>>    N50: 106053
>>    Median: 35487
>>    Largest: 269165
>>
>>
>
>Also, at the moment, the algorithms are better with pairs of reads than
>with single-end reads
>because the algorithms will match the outer distances to the empirical
>distribution of signal for each library.
>
>
>We devised some other neat algorithms on pairs, such as this:
>
>
>     Constrained traversal of repeats with paired sequences.
>     Sébastien Boisvert, Élénie Godzaridis, François Laviolette & Jacques
>Corbeil.
>     First Annual RECOMB Satellite Workshop on Massively Parallel
>Sequencing, March 26-27 2011, Vancouver, BC, Canada.
>     abstract: 
>http://boisvert.info/publications/RECOMB-seq-2011-abstract.html
>     presentation:
>http://www.boisvert.info/dropbox/recomb-seq-2011-talk.pdf
>
>The read recycling in Ray actually is something that is not implemented
>in other assemblers, as far as I know.
>
>I fixed a very rare bug in the read recycling code yesterday --
>https://github.com/sebhtml/ray/commit/354c02bc7f3e963fb22809c3a5176e5f8d6c
>ba26
>
>
>For long reads, the matching algorithms are not devised to handle
>insertions very well, except in the case
>where any insertion can be mated with a deletion (and vice-versa).
>
>>
>>
>>
>>
>>
>> ### Command 1 ####
>> mpiexec -n 16 Ray \
>>   -k \
>>   31 \
>>   -s \
>>   MiSeq_Ecoli_MG1655_110721_PF_R1.fastq \
>>   -s \
>>   MiSeq_Ecoli_MG1655_110721_PF_R2.fastq \
>>   -o \
>>   Ray_single
>> ####################
>>
>>
>
>
>Did you know that Ray will accept natively compressed files, such as
>.fastq.bz2 or .fastq.gz ?
>
>
>You just need to compile with HAVE_LIBZ=y (for .gz) and/or with
>HAVE_LIBBZ2=y (for .bz2).
>
>  
>> ### Command 2 ####
>> mpiexec -n 16 Ray \
>>   -k \
>>   31 \
>>   -p \
>>   MiSeq_Ecoli_MG1655_110721_PF_R1.fastq \
>>   MiSeq_Ecoli_MG1655_110721_PF_R2.fastq \
>>   -o \
>>   Ray_paired
>> ##################
>>
>>
>>
>
>That's really a nice test. It was really easy for me to understand what
>you did just by reading your message.
>  
>>
>>
>>
>> Chien-Chi Lo
>> Research Technologist
>> Los Alamos National Laboratory
>>
>


------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to