> > That is the first thing you should do I think. k=21 will pick up more > redundant k-mers than k=31 or k=61. > > Basically, this is why: > > > If the sequencing error rate is > 4.7% (1/21), then mostly all k-mers will be > unique and bad for k=21. > If the sequencing error rate is > 3.2% (1/31), then mostly all k-mers will be > unique and bad for k=31. > If the sequencing error rate is > 1.6% (1/61), then mostly all k-mers will be > unique and bad for k=61. > > I believe Illumina HiSeq TruSeq sequencing error rate varies between 0 and 2 > %. You mileage may vary however depending on the quality of DNA and library > preparation (nicks > in DNA for instance during the library preparation). >
I believe that most people using long Kmers in assembly have been 3' trimming/rejecting reads where they fall below (at least) Q=31 on a per read basis and with the newer Illumina chemistries at Q=35. I don't know of anyone who is really putting raw FASTA read data directly into the assemblers - certainly not when using long Kmers for exactly the reason above. I suppose FASTQ data perhaps but the way the FASTQ data is dealt with can be pretty different between assemblers so it can have mixed results I think. Adrian ------------------------------------------------------------------------------ 10 Tips for Better Web Security Learn 10 ways to better secure your business today. Topics covered include: Web security, SSL, hacker attacks & Denial of Service (DoS), private keys, security Microsoft Exchange, secure Instant Messaging, and much more. http://www.accelacomm.com/jaw/sfnl/114/51426210/ _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users