I'm again trying to use Ray to assemble something circular with a crazy high coverage, this time its a set of circular Neisseria meningitidis [bacterial] genomes. Coverage distributions can be found here:
http://www.gringene.org/data/CoverageDistribution.txt It seems like Ray is picking 2 as the coverage: http://www.gringene.org/data/CoverageDistributionAnalysis.txt Whereas the correct value from my observation of this graph should be something closer to 500: http://www.gringene.org/data/coverage_plot.pdf Presumably this is due to rare SNPs swamping out the numbers, as per Rick Westerman's email on 12 Feb 2014: > The above concept works well if there are no sequencing errors. But of > course there are thus kmers with errors become unique and we see > the initial part of the occurrence list look more like : > kmers occurring 1 time: 10,000 > kmers occurring 2 times: 3,000 > kmers occurring 3 times: 500 > ... I'd recommend ignoring the first few numbers on the coverage plot if it starts high and drops down (i.e negative slope), and only considers peaks once there has been a somewhat consistent increase in coverage values. The ImageJ peak finder does this by default, excluding peaks on the edges of a graph if they're not surrounded by valleys: http://imagej.nih.gov/ij/source/ij/plugin/filter/MaximumFinder.java [see 'findMaxima' function] I have no idea if this peak coverage problem will impact how Ray deals with the assembly. If it does, I'll probably need to do some kmer-based error correction prior to assembly with Ray in the hope that it will bring those kmers with coverage 2-20 down to frequencies below the true peak value. -- David Eccles Bioinformatics Research Analyst, Gringene Bioinformatics Room 2.10 x857 Malaghan Institute of Medical Research ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users