Very informative. Thanks.
On 3/6/13 10:31 AM, "Sébastien Boisvert" <[email protected]> wrote: >Hello, > >[Please CC the user mailing list] > >On 03/05/2013 06:50 PM, Lo, Chien-Chi wrote: >> Hi Sébastien, >> >> I have a question about how Ray weight on the paired end reads. I ran a >> Ecoli Miseq paired end data using Ray 2.1.0 twice. >> First, I treat all >> reads are single-ended and secondly I ran Ray in paired-end mode. The >> commands I used are pasted below. The result contigs' number is very >> different. Why paired-end mode generate better result? > >Let me tell you a short story. > >With your DNA reads, Ray builds a graph. The graph is distributed >uniformily onto all >the MPI ranks. > >Until recently, it was impossible to see such a graph with energy. > >Here's such a graph for Escherichia coli DH10B sequenced on a Illumina(R) >MiSeq(R) 2x250 and assembled with Ray using a k-mer length of 91. > > >http://browser.cloud.raytrek.com/client/?map=0§ion=3®ion=4&location >=178944 > > >Once this graph is built, the Ray MPI ranks will begin an extraordinary >journey in which they will >collectively, as a tribe, perform parallel graph traversals using >heuristics. This journey is >extraordinary because of the sheer amount of messages that are passed >between MPI ranks -- the >computation granularity is in the order of 10-70 microseconds. > >The heuristics in Ray use paired reads, mate pairs, and single-end read >threading to perform >the graph traversals. So Ray uses paired information for other purposes >than just scaffolding. >This is also the case with some other assemblers too, like ABySS. > > >These original Ray heuristics are described in this open-access >publication: > > http://online.liebertpub.com/doi/abs/10.1089/cmb.2009.0238 > > >Recently, our group have generalized these heuristics and other parts of >the algorithms to handle >mixes of genomes. The new algorithms work well on bacterial genomes, on >metagenomes, and also on >transcriptomes. However, the algorithms probably do not handle >alternative splicing very well. > >These generalizations are described in this open-access publication: > > http://genomebiology.com/2012/13/12/R122/abstract > > >> Thanks! >> >> For single-ended, >> Contigs >= 500 nt >> Number: 583 >> Total length: 4357532 >> Average: 7474 >> N50: 10494 >> Median: 5905 >> Largest: 38443 >> For paired-ended, >> Contigs >= 500 nt >> Number: 87 >> Total length: 4603370 >> Average: 52912 >> N50: 106053 >> Median: 35487 >> Largest: 269165 >> >> > >Also, at the moment, the algorithms are better with pairs of reads than >with single-end reads >because the algorithms will match the outer distances to the empirical >distribution of signal for each library. > > >We devised some other neat algorithms on pairs, such as this: > > > Constrained traversal of repeats with paired sequences. > Sébastien Boisvert, Élénie Godzaridis, François Laviolette & Jacques >Corbeil. > First Annual RECOMB Satellite Workshop on Massively Parallel >Sequencing, March 26-27 2011, Vancouver, BC, Canada. > abstract: >http://boisvert.info/publications/RECOMB-seq-2011-abstract.html > presentation: >http://www.boisvert.info/dropbox/recomb-seq-2011-talk.pdf > >The read recycling in Ray actually is something that is not implemented >in other assemblers, as far as I know. > >I fixed a very rare bug in the read recycling code yesterday -- >https://github.com/sebhtml/ray/commit/354c02bc7f3e963fb22809c3a5176e5f8d6c >ba26 > > >For long reads, the matching algorithms are not devised to handle >insertions very well, except in the case >where any insertion can be mated with a deletion (and vice-versa). > >> >> >> >> >> >> ### Command 1 #### >> mpiexec -n 16 Ray \ >> -k \ >> 31 \ >> -s \ >> MiSeq_Ecoli_MG1655_110721_PF_R1.fastq \ >> -s \ >> MiSeq_Ecoli_MG1655_110721_PF_R2.fastq \ >> -o \ >> Ray_single >> #################### >> >> > > >Did you know that Ray will accept natively compressed files, such as >.fastq.bz2 or .fastq.gz ? > > >You just need to compile with HAVE_LIBZ=y (for .gz) and/or with >HAVE_LIBBZ2=y (for .bz2). > > >> ### Command 2 #### >> mpiexec -n 16 Ray \ >> -k \ >> 31 \ >> -p \ >> MiSeq_Ecoli_MG1655_110721_PF_R1.fastq \ >> MiSeq_Ecoli_MG1655_110721_PF_R2.fastq \ >> -o \ >> Ray_paired >> ################## >> >> >> > >That's really a nice test. It was really easy for me to understand what >you did just by reading your message. > >> >> >> >> Chien-Chi Lo >> Research Technologist >> Los Alamos National Laboratory >> > ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
