Hi Adrian,
How consistent is the column "Mode k-mer coverage depth" in the file Assembly/BiologicalAbundances/_DeNovoAssembly/Contigs.tsv with the "PeakCoverage" in Assembly/CoverageDistribution.txt ? Ray v1.7 computes the peak coverage using the genome-wide de Bruijn graph whereas Ray v2.0-rc5 does that locally, which should be better most of the time because coverage is not even everywhere. See also my questions below. On Sun, 2012-04-01 at 08:05 -0400, Adrian Platts wrote: > Hi Sebastien, > > > Below are some contig results of running Ray 1.7 and 2.0rc5 on the > same plant data using the same basic command line. The data is 3' > trimmed paired end 100nt (nominal) illumina and a small set of single > end data from the same data source. Do you notice any difference in the assembly quality if you omit this little extraneous step of read trimming ? > Do you think the small drop in N50/mean contig size indicates a > higher confidence? It may be so, see my comment above on coverage locality. There is a presentation entitled "N50 must die!?" by Professor Ian Korf from University of California at Davis. According to the Ian Korf and the assemblathon, one of the good metrics is the number of correctly aligning genes from a related living organism. And, as it was also pointed by the ALLPATHS-LG paper, it is equally important to assert long-range correctness. Like anything, diversity wins and it is likely better having more metrics. Sébastien > It does not seem to be due to a few more small sequences in the 2rc5 > output as the difference in size persists if we filter > out the <500nt and <1000nt contigs. ok. > > > Thanks as always for the great work, looking forward to exploring the > new population features. > Thank you for your testing work. > > Adrian > Adrian Platts > McGill > > > cat RayCommand.txt > mpiexec -n 40 Ray \ > -i \ > 31Q_paired_trimmed_s_4_1_sequence.fasta \ > 286 \ > 30 \ > -i \ > 31Q_paired_trimmed_s_5_1_sequence.fasta \ > 286 \ > 30 \ > -i \ > 28-20-65_paired_trimmed_r1_s_5_1_sequence.fasta \ > 286 \ > 30 \ > -s \ > paired_trimmed_s_7_1_Sisymbrium_sequence.fasta \ > -s \ > paired_trimmed_s_4_1_S_Irio_sequence.fasta \ > -k \ > 35 \ > -o \ > SI_Ray_35MP5_10_as-single_PE > > > > > > > > > Ray 1.7 > > > cat CoverageDistributionAnalysis.txt > k-mer length: 35 > Lowest coverage observed: 2 > MinimumCoverage: 7 > PeakCoverage: 40 > RepeatCoverage: 73 > Number of k-mers with at least MinimumCoverage: 428826618 k-mers > Estimated genome length: 214413309 nucleotides > Percentage of vertices with coverage 2: 14.4033 % > DistributionFile: SI_Ray_35/CoverageDistribution.txt > > > <-- Information for assembly 'Contigs.fasta' --> > > > > > Number of scaffolds > 95362 > Total size of scaffolds > 200009871 > Longest scaffold > 129123 > Shortest scaffold > 100 > Number of scaffolds > 500 nt > 31423 33.0% > Number of scaffolds > 1K nt > 26446 27.7% > Number of scaffolds > 10K nt > 5268 5.5% > Number of scaffolds > 100K nt > 5 0.0% > Number of scaffolds > 1M nt > 0 0.0% > Mean scaffold size > 2097 > Median scaffold size > 177 > N50 scaffold length > 11154 > L50 scaffold count > 4559 > LG50 scaffold count > 3349 > N50 scaffold - NG50 scaffold length difference > 2720 > scaffold %A > 32.16 > scaffold %C > 18.03 > scaffold %G > 17.95 > scaffold %T > 31.86 > scaffold %N > 0.00 > scaffold %non-ACGTN > 0.00 > Number of scaffold non-ACGTN nt > 0 > > > Percentage of assembly in scaffolded contigs > 0.0% > Percentage of assembly in unscaffolded contigs > 100.0% > Average number of contigs per scaffold > 1.0 > Average length of break (>25 Ns) between contigs in scaffold > 0 > > > Number of contigs > 95362 > Number of contigs in scaffolds > 0 > Number of contigs not in scaffolds > 95362 > Total size of contigs > 200009871 > Longest contig > 129123 > Shortest contig > 100 > Number of contigs > 500 nt > 31423 33.0% > Number of contigs > 1K nt > 26446 27.7% > Number of contigs > 10K nt > 5268 5.5% > Number of contigs > 100K nt > 5 0.0% > Number of contigs > 1M nt > 0 0.0% > Mean contig size > 2097 > Median contig size > 177 > N50 contig length > 11154 > L50 contig count > 4559 > LG50 contig count > 3349 > N50 contig - NG50 contig length difference > 2720 > contig %A > 32.16 > contig %C > 18.03 > contig %G > 17.95 > contig %T > 31.86 > contig %N > 0.00 > contig %non-ACGTN > 0.00 > Number of contig non-ACGTN nt > 0 > > > > > > > Ray 2.0 rc5 > > > cat CoverageDistributionAnalysis.txt > k-mer length: 35 > Lowest coverage observed: 2 > MinimumCoverage: 7 > PeakCoverage: 40 > RepeatCoverage: 73 > Number of k-mers with at least MinimumCoverage: 428826612 k-mers > Percentage of vertices with coverage 2: 14.4728 % > DistributionFile: SI_Ray_35/CoverageDistribution.txt > > > <-- Information for assembly 'Contigs.fasta' --> > > > > > Number of scaffolds > 117899 > Total size of scaffolds > 204514851 > Longest scaffold > 129123 > Shortest scaffold > 100 > Number of scaffolds > 500 nt > 33560 28.5% > Number of scaffolds > 1K nt > 28152 23.9% > Number of scaffolds > 10K nt > 5199 4.4% > Number of scaffolds > 100K nt > 6 0.0% > Number of scaffolds > 1M nt > 0 0.0% > Mean scaffold size > 1735 > Median scaffold size > 165 > N50 scaffold length > 10514 > L50 scaffold count > 4870 > LG50 scaffold count > 3411 > N50 scaffold - NG50 scaffold length difference > 2837 > scaffold %A > 32.14 > scaffold %C > 18.06 > scaffold %G > 17.99 > scaffold %T > 31.82 > scaffold %N > 0.00 > scaffold %non-ACGTN > 0.00 > Number of scaffold non-ACGTN nt > 0 > > > Percentage of assembly in scaffolded contigs > 0.0% > Percentage of assembly in unscaffolded contigs > 100.0% > Average number of contigs per scaffold > 1.0 > Average length of break (>25 Ns) between contigs in scaffold > 0 > > > Number of contigs > 117899 > Number of contigs in scaffolds > 0 > Number of contigs not in scaffolds > 117899 > Total size of contigs > 204514851 > Longest contig > 129123 > Shortest contig > 100 > Number of contigs > 500 nt > 33560 28.5% > Number of contigs > 1K nt > 28152 23.9% > Number of contigs > 10K nt > 5199 4.4% > Number of contigs > 100K nt > 6 0.0% > Number of contigs > 1M nt > 0 0.0% > Mean contig size > 1735 > Median contig size > 165 > N50 contig length > 10514 > L50 contig count > 4870 > LG50 contig count > 3411 > N50 contig - NG50 contig length difference > 2837 > contig %A > 32.14 > contig %C > 18.06 > contig %G > 17.99 > contig %T > 31.82 > contig %N > 0.00 > contig %non-ACGTN > 0.00 > Number of contig non-ACGTN nt > 0 ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users