> 
> That is the first thing you should do I think. k=21 will pick up more 
> redundant k-mers than k=31 or k=61.
> 
> Basically, this is why:
> 
> 
> If the sequencing error rate is > 4.7% (1/21), then mostly all k-mers will be 
> unique and bad for k=21.
> If the sequencing error rate is > 3.2% (1/31), then mostly all k-mers will be 
> unique and bad for k=31.
> If the sequencing error rate is > 1.6% (1/61), then mostly all k-mers will be 
> unique and bad for k=61.
> 
> I believe Illumina HiSeq TruSeq sequencing error rate varies between 0 and 2 
> %. You mileage may vary however depending on the quality of DNA and library 
> preparation (nicks 
> in DNA for instance during the library preparation).
> 

I believe that most people using long Kmers in assembly have been 3' 
trimming/rejecting reads where they fall below (at least) Q=31 on a per read 
basis and
with the newer Illumina chemistries at Q=35.  I don't know of anyone who is 
really putting raw FASTA read data directly into the assemblers - certainly not 
when using long Kmers
for exactly the reason above.

I suppose FASTQ data perhaps but the way the FASTQ data is dealt with can be 
pretty different between assemblers so it can have mixed results I think.

Adrian


------------------------------------------------------------------------------
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks & Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more.
http://www.accelacomm.com/jaw/sfnl/114/51426210/
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to