The peak coverage reported in this file is not used in the Ray series 2.x.x.
The peak coverage is instead computed individually for each seed. See section 'From Ray to Ray Meta' in http://genomebiology.com/2012/13/12/R122 On 25 février 2014 21:02, David Eccles (gringer) [bioinformat...@gringene.org] wrote: > À : denovoassembler-users@lists.sourceforge.net > Objet : [Denovoassembler-users] Coverage Distribution Analysis guesses > coverage peak of 2 instead of 500 > > I'm again trying to use Ray to assemble something circular with a crazy high > coverage, this time its a set of circular Neisseria > meningitidis [bacterial] genomes. Coverage distributions can be found here: > > http://www.gringene.org/data/CoverageDistribution.txt > > It seems like Ray is picking 2 as the coverage: > > http://www.gringene.org/data/CoverageDistributionAnalysis.txt > > Whereas the correct value from my observation of this graph should be > something closer to 500: > > http://www.gringene.org/data/coverage_plot.pdf > > Presumably this is due to rare SNPs swamping out the numbers, as per Rick > Westerman's email on 12 Feb 2014: > >> The above concept works well if there are no sequencing errors. But of >> course there are thus kmers with errors become unique and we see >> the initial part of the occurrence list look more like : >> kmers occurring 1 time: 10,000 >> kmers occurring 2 times: 3,000 >> kmers occurring 3 times: 500 >> ... > > I'd recommend ignoring the first few numbers on the coverage plot if it > starts high and drops down (i.e negative slope), and only considers > peaks once there has been a somewhat consistent increase in coverage values. > The ImageJ peak finder does this by default, excluding peaks on > the edges of a graph if they're not surrounded by valleys: > > http://imagej.nih.gov/ij/source/ij/plugin/filter/MaximumFinder.java > > [see 'findMaxima' function] > > I have no idea if this peak coverage problem will impact how Ray deals with > the assembly. If it does, I'll probably need to do some > kmer-based error correction prior to assembly with Ray in the hope that it > will bring those kmers with coverage 2-20 down to frequencies > below the true peak value. > > -- > David Eccles > Bioinformatics Research Analyst, Gringene Bioinformatics > Room 2.10 x857 > Malaghan Institute of Medical Research > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > Denovoassembler-users mailing list > Denovoassembler-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users ------------------------------------------------------------------------------ Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users