On Mon, 2012-03-05 at 22:11 -0500, LIU wrote:
> Thanks very much for your explanation about the kmer myth.
> 
> 
> The paired reads are not equal in length because i trimmed the reads
> based on quality. Specifically, i trimmed the reads using
> filting criterial -- consecutive 15 bases having quality score higher
> than 15 (Phred Score). Some paired-end reads were also broken.
> 


> 
> I found only 31/Library1.txt
> 293 1
> 347 1
> 359 1
> 391 1 
> 

This shows that Ray sees no paired reads in you data.

> 
> I do not know if i  have deleted the others if they were produced.
> 
> 
> 
> 
> The last 10 lines of 31/SeedLengthDistribution.txt are :
> 
> 
> 16892 1
> 17117 1
> 18662 1
> 19763 1
> 21295 1
> 23185 1
> 23416 1
> 25293 1
> 26018 1
> 28186 1
> 

That is just fine. Ray uses these long DNA sequences present in your
sample to estimate insert lengths for paired reads.

However, it seems that Ray is unable to gather enough signal for your
paired reads.

Can you try without trimming your reads. I sense that maybe the second
sequence is usually shorter than the k-mer length which renders any
second read obsolete should it be shorter than the k-mer length.

> 
> Thanks.
> 
> 
> Best Regards,
> Huanle
> 
> 
> On Tue, Mar 6, 2012 at 12:34 PM, Sébastien Boisvert
> <sebastien.boisver...@ulaval.ca> wrote:
>         See my responses below.
>         
>         On Mon, 2012-03-05 at 18:53 -0500, LIU wrote:
>         > Hi ,
>         >
>         >
>         > Thanks for your response.
>         >
>         > On Tue, Mar 6, 2012 at 2:21 AM, Sébastien Boisvert
>         > <sebastien.boisver...@ulaval.ca> wrote:
>         >         1. Using a k-mer length of 71 will _presumably_ not
>         work very
>         >         well
>         >         because of sequencing errors. First do a test run at
>         k=31.
>         > Yes i also ran k=31.
>         > It is the same case as k=71.
>         > One more question about choice of kmer length.
>         > I was also told that longer kmer is supposed to produce more
>         accurate
>         > assembly, while shorter ones are more prone to sequencing
>         errors.
>         > I am confused. perhaps  i should open another ticket to ask
>         this
>         > question. But i really appreciate your answer.
>         >
>         
>         
>         Using longer k-mer makes the k-mers more unique.
>         
>         Let's say that this is a read:
>         
>                                         *
>         
> TGTGTGGGTCAGTATGTAGTCCACCTGGAAATCTTCTTTTTCCAGATTTGCCCATCCTTCTTCGTCCTCTTCCCG
>         
>         
>         The '*' marks a sequencing error.
>         
>         For 71-mers, the sliding window is:
>         
>                                         *
>         
> TGTGTGGGTCAGTATGTAGTCCACCTGGAAATCTTCTTTTTCCAGATTTGCCCATCCTTCTTCGTCCTCTTCCCG
>         
>         
> kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
>         
>         So basically all the k-mers generated from that sliding window
>         contain
>         the sequencing error.
>         
>         
>         For 31-mers, the sliding window is:
>         
>                                         *
>         
> TGTGTGGGTCAGTATGTAGTCCACCTGGAAATCTTCTTTTTCCAGATTTGCCCATCCTTCTTCGTCCTCTTCCCG
>         
>         kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
>         
>         
>         So with 31-mers, you will get some erroneous k-mers and some
>         genuine
>         k-mers.
>         
>         
>         >
>         >
>         >
>         >         2. Are your interleaved files properly generated ?
>         >
>         >         sequence1/1
>         >         sequence1/2
>         >         sequence2/1
>         >         sequence2/2
>         >         sequence3/1
>         >         sequence3/2
>         >         Yes, i think my sequences are correctlly
>         interleaved. E.G.,
>         > >@AGRF-21_0011_FC64J74AAXX:2:1:1804:936#CCGACT/1
>         >
>         
> TACATATATACATGATACATACATACATGATATATTCATATGTCACCTAAGGATGTATCATACATGATACATACATCCATGATACATACATACCG
>         >
>         >
>         > >@AGRF-21_0011_FC64J74AAXX:2:1:1804:936#CCGACT/2
>         >
>         
> GATGTATGTATCATGTATGATACATCCTTAGGTGACATATGAATATATCATGTATGTATGTATCATGTATATATGTATAAATATGTAT
>         >
>         >
>         > >@AGRF-21_0011_FC64J74AAXX:2:1:1983:932#AATTAA/1
>         > TATATAGATAGATTTCA
>         >
>         >
>         > >@AGRF-21_0011_FC64J74AAXX:2:1:1983:932#AATTAA/2
>         >
>         CTTTTTTTTTGTTTCAGTCCCCGTGCTTTCAAAATTGCCCGGGTTCAGTCCCTAAGTCGTTAAGTCCGTT
>         >  In fact, i also tried velvet. It produced different contigs
>         and
>         > scaffolds. But of course Ray and Velvet may not be directly
>         compared
>         > because of different scaffolding strategy (i do not know
>         this, it's
>         > simply a guess).
>         
>         
>         This look ok.
>         
>         BUt why is the second sequence shorter than the first one ?
>         
>         Usually, Illumina sequencing produces 2 sequences of the same
>         length for
>         each pair of sequences.
>         
>         >
>         >
>         >         Do you get anything in LibraryStatistics.txt ?
>         >
>         >
>         > The LibraryStatixtics are
>         >    NumberOfPairedLibraries: 3
>         >
>         >
>         > LibraryNumber: 0
>         >  InputFormat: Interleaved,Paired
>         >  DetectionType: Automatic
>         >
>          File: /home/s4196896/mix_assembly/input/t15c15/gs1.shuffled.fasta.gz
>         >   NumberOfSequences: 248332323
>         >  Distribution: 31/Library0.txt
>         >
>         >
>         > LibraryNumber: 1
>         >  InputFormat: Interleaved,Paired
>         >  DetectionType: Automatic
>         >
>          File: /home/s4196896/mix_assembly/input/t15c15/gs3.shuffled.fasta.gz
>         >   NumberOfSequences: 405911176
>         >  Distribution: 31/Library1.txt
>         >
>         >
>         > LibraryNumber: 2
>         >  InputFormat: Interleaved,Paired
>         >  DetectionType: Automatic
>         >
>          File: /home/s4196896/mix_assembly/input/t15c15/gs2.shuffled.fasta.gz
>         >   NumberOfSequences: 234114234
>         >  Distribution: 31/Library2.txt
>         >
>         
>         
>         Is there anything in 31/Library0.txt,  31/Library1.txt,
>          31/Library2.txt
>         
>         
>         Can you provide the last 10 lines of
>         SeedLengthDistribution.txt ?
>         
>         >
>         > Best Regards,
>         > Huanle
>         >
>         >         On Thu, 2012-03-01 at 17:06 -0500, LIU wrote:
>         >         > Hi There,
>         >         >
>         >         > I have been using Ray to de novo assembly.
>         >         >
>         >         > The input reads are a mix of illumina pair-end
>         reads (this
>         >         account for
>         >         > 90%), illumina single-end reads and 454 single end
>         reads.
>         >         >
>         >         > The command i used is
>         >         > mpiexec -n 60 Ray \
>         >         >  -i \
>         >         >
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs1.shuffled.fasta.gz \
>         >         >  -i \
>         >         >
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs2.shuffled.fasta.gz \
>         >         >  -i \
>         >         >
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs3.shuffled.fasta.gz \
>         >         >  -s \
>         >         >
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs2.single.fasta.gz
>         >         \
>         >         >  -s \
>         >         >
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs3.single.fasta.gz
>         >         \
>         >         >  -s \
>         >         >
>         >
>          /home/s4196896/mix_assembly/input/t15c15/gs1.single.fasta.gz
>         >         \
>         >         >  -s \
>         >         >
>          /home/s4196896/mix_assembly/input/radseq1.seeds.fasta \
>         >         >  -s \
>         >         >  /home/s4196896/mix_assembly/input/radseq_v2.fasta
>         \
>         >         >  -s \
>         >         >
>         >
>          /work1/s4196896/454_assembly/raw_reads/all_genomic_reads.short.fasta
>         >         > \
>         >         >  -s \
>         >         >
>         >
>          /work1/s4196896/454_assembly/raw_reads/all_genomic_reads.long.fasta \
>         >         >  -o \
>         >         >  71 \
>         >         >  -k \
>         >         >  71
>         >         >
>         >         > The output shows that scaffolds and contigs are
>         the same
>         >         (same N50,
>         >         > total number of bases and number of sequences
>         etc.).
>         >         >
>         >         > This confused me.
>         >         >
>         >         >
>         >         > I hope someone can help me out.
>         >         >
>         >         > Thanks in advance.
>         >         >
>         >         > Kind Regards,
>         >         > --
>         >         > Huanle
>         >         >
>         >         > School of biological Sciences, UQ, QLD, AU
>         >
>         >
>         >
>         >
>         >
>         >
>         >
>         > --
>         > Huanle
>         >
>         > School of biological Sciences, UQ, QLD, AU
>         >
>         
>         
>         
> 
> 
> 
> 
> -- 
> Huanle 
> 
> School of biological Sciences, UQ, QLD, AU
> 




------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to