Hi Adrian,

How consistent is the column "Mode k-mer coverage depth" in the file
Assembly/BiologicalAbundances/_DeNovoAssembly/Contigs.tsv with the
"PeakCoverage" in Assembly/CoverageDistribution.txt ?


Ray v1.7 computes the peak coverage using the genome-wide de Bruijn
graph whereas Ray v2.0-rc5 does that locally, which should be better
most of the time because coverage is not even everywhere.


See also my questions below.

On Sun, 2012-04-01 at 08:05 -0400, Adrian Platts wrote:
> Hi Sebastien,
> 
> 
> Below are some contig results of running Ray 1.7 and 2.0rc5 on the
> same plant data using the same basic command line.  The data is 3'
> trimmed paired end 100nt (nominal) illumina and a small set of single
> end data from the same data source.

Do you notice any difference in the assembly quality if you omit this
little extraneous step of read trimming ?

>  Do you think the small drop in N50/mean contig size indicates a
> higher confidence?

It may be so, see my comment above on coverage locality.

There is a presentation entitled "N50 must die!?" by Professor Ian Korf
from University of California at Davis.

According to the Ian Korf and the assemblathon, one of the good metrics
is the number of correctly aligning genes from a related living
organism.

And, as it was also pointed by the ALLPATHS-LG paper, it is equally
important to assert long-range correctness.


Like anything, diversity wins and it is likely better having more
metrics.

        
                 Sébastien


>   It does not seem to be due to a few more small sequences in the 2rc5
> output as the difference in size persists if we filter
> out the <500nt and <1000nt contigs.

ok.

> 
> 
> Thanks as always for the great work,  looking forward to exploring the
> new population features.
> 

Thank you for your testing work.

> 
> Adrian
> Adrian Platts
> McGill
> 
> 
> cat RayCommand.txt
> mpiexec -n 40 Ray \
>  -i \
>  31Q_paired_trimmed_s_4_1_sequence.fasta \
>  286 \
>  30 \
>  -i \
> 31Q_paired_trimmed_s_5_1_sequence.fasta \
>  286 \
>  30 \
>  -i \
> 28-20-65_paired_trimmed_r1_s_5_1_sequence.fasta \
>  286 \
>  30 \
>  -s \
> paired_trimmed_s_7_1_Sisymbrium_sequence.fasta \
>  -s \
> paired_trimmed_s_4_1_S_Irio_sequence.fasta \
>  -k \
>  35 \
>  -o \
>  SI_Ray_35MP5_10_as-single_PE
> 
> 
> 
> 
> 
> 
> 
> 
> Ray 1.7
> 
> 
> cat CoverageDistributionAnalysis.txt
> k-mer length: 35
> Lowest coverage observed: 2
> MinimumCoverage: 7
> PeakCoverage: 40
> RepeatCoverage: 73
> Number of k-mers with at least MinimumCoverage: 428826618 k-mers
> Estimated genome length: 214413309 nucleotides
> Percentage of vertices with coverage 2: 14.4033 %
> DistributionFile: SI_Ray_35/CoverageDistribution.txt
> 
> 
> <-- Information for assembly 'Contigs.fasta' -->
> 
> 
> 
> 
>                                          Number of scaffolds
>  95362
>                                      Total size of scaffolds
>  200009871
>                                             Longest scaffold
> 129123
>                                            Shortest scaffold
>  100
>                                 Number of scaffolds > 500 nt
>  31423  33.0%
>                                  Number of scaffolds > 1K nt
>  26446  27.7%
>                                 Number of scaffolds > 10K nt
> 5268   5.5%
>                                Number of scaffolds > 100K nt
>  5   0.0%
>                                  Number of scaffolds > 1M nt
>  0   0.0%
>                                           Mean scaffold size
> 2097
>                                         Median scaffold size
>  177
>                                          N50 scaffold length
>  11154
>                                           L50 scaffold count
> 4559
>                                          LG50 scaffold count
> 3349
>               N50 scaffold - NG50 scaffold length difference
> 2720
>                                                  scaffold %A
>  32.16
>                                                  scaffold %C
>  18.03
>                                                  scaffold %G
>  17.95
>                                                  scaffold %T
>  31.86
>                                                  scaffold %N
> 0.00
>                                          scaffold %non-ACGTN
> 0.00
>                              Number of scaffold non-ACGTN nt
>  0
> 
> 
>                 Percentage of assembly in scaffolded contigs
> 0.0%
>               Percentage of assembly in unscaffolded contigs
> 100.0%
>                       Average number of contigs per scaffold
>  1.0
> Average length of break (>25 Ns) between contigs in scaffold
>  0
> 
> 
>                                            Number of contigs
>  95362
>                               Number of contigs in scaffolds
>  0
>                           Number of contigs not in scaffolds
>  95362
>                                        Total size of contigs
>  200009871
>                                               Longest contig
> 129123
>                                              Shortest contig
>  100
>                                   Number of contigs > 500 nt
>  31423  33.0%
>                                    Number of contigs > 1K nt
>  26446  27.7%
>                                   Number of contigs > 10K nt
> 5268   5.5%
>                                  Number of contigs > 100K nt
>  5   0.0%
>                                    Number of contigs > 1M nt
>  0   0.0%
>                                             Mean contig size
> 2097
>                                           Median contig size
>  177
>                                            N50 contig length
>  11154
>                                             L50 contig count
> 4559
>                                            LG50 contig count
> 3349
>                   N50 contig - NG50 contig length difference
> 2720
>                                                    contig %A
>  32.16
>                                                    contig %C
>  18.03
>                                                    contig %G
>  17.95
>                                                    contig %T
>  31.86
>                                                    contig %N
> 0.00
>                                            contig %non-ACGTN
> 0.00
>                                Number of contig non-ACGTN nt
>  0
> 
> 
> 
> 
> 
> 
> Ray 2.0 rc5
> 
> 
> cat CoverageDistributionAnalysis.txt
> k-mer length: 35
> Lowest coverage observed: 2
> MinimumCoverage: 7
> PeakCoverage: 40
> RepeatCoverage: 73
> Number of k-mers with at least MinimumCoverage: 428826612 k-mers
> Percentage of vertices with coverage 2: 14.4728 %
> DistributionFile: SI_Ray_35/CoverageDistribution.txt
> 
> 
> <-- Information for assembly 'Contigs.fasta' -->
> 
> 
> 
> 
>                                          Number of scaffolds
> 117899
>                                      Total size of scaffolds
>  204514851
>                                             Longest scaffold
> 129123
>                                            Shortest scaffold
>  100
>                                 Number of scaffolds > 500 nt
>  33560  28.5%
>                                  Number of scaffolds > 1K nt
>  28152  23.9%
>                                 Number of scaffolds > 10K nt
> 5199   4.4%
>                                Number of scaffolds > 100K nt
>  6   0.0%
>                                  Number of scaffolds > 1M nt
>  0   0.0%
>                                           Mean scaffold size
> 1735
>                                         Median scaffold size
>  165
>                                          N50 scaffold length
>  10514
>                                           L50 scaffold count
> 4870
>                                          LG50 scaffold count
> 3411
>               N50 scaffold - NG50 scaffold length difference
> 2837
>                                                  scaffold %A
>  32.14
>                                                  scaffold %C
>  18.06
>                                                  scaffold %G
>  17.99
>                                                  scaffold %T
>  31.82
>                                                  scaffold %N
> 0.00
>                                          scaffold %non-ACGTN
> 0.00
>                              Number of scaffold non-ACGTN nt
>  0
> 
> 
>                 Percentage of assembly in scaffolded contigs
> 0.0%
>               Percentage of assembly in unscaffolded contigs
> 100.0%
>                       Average number of contigs per scaffold
>  1.0
> Average length of break (>25 Ns) between contigs in scaffold
>  0
> 
> 
>                                            Number of contigs
> 117899
>                               Number of contigs in scaffolds
>  0
>                           Number of contigs not in scaffolds
> 117899
>                                        Total size of contigs
>  204514851
>                                               Longest contig
> 129123
>                                              Shortest contig
>  100
>                                   Number of contigs > 500 nt
>  33560  28.5%
>                                    Number of contigs > 1K nt
>  28152  23.9%
>                                   Number of contigs > 10K nt
> 5199   4.4%
>                                  Number of contigs > 100K nt
>  6   0.0%
>                                    Number of contigs > 1M nt
>  0   0.0%
>                                             Mean contig size
> 1735
>                                           Median contig size
>  165
>                                            N50 contig length
>  10514
>                                             L50 contig count
> 4870
>                                            LG50 contig count
> 3411
>                   N50 contig - NG50 contig length difference
> 2837
>                                                    contig %A
>  32.14
>                                                    contig %C
>  18.06
>                                                    contig %G
>  17.99
>                                                    contig %T
>  31.82
>                                                    contig %N
> 0.00
>                                            contig %non-ACGTN
> 0.00
>                                Number of contig non-ACGTN nt
>  0



------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to