I'm again trying to use Ray to assemble something circular with a crazy high 
coverage, this time its a set of circular Neisseria
meningitidis [bacterial] genomes. Coverage distributions can be found here:

http://www.gringene.org/data/CoverageDistribution.txt

It seems like Ray is picking 2 as the coverage:

http://www.gringene.org/data/CoverageDistributionAnalysis.txt

Whereas the correct value from my observation of this graph should be something 
closer to 500:

http://www.gringene.org/data/coverage_plot.pdf

Presumably this is due to rare SNPs swamping out the numbers, as per Rick 
Westerman's email on 12 Feb 2014:

> The above concept works well if there are no sequencing errors.  But of 
> course there are thus kmers with errors become unique and we see
> the initial part of the occurrence list look more like :
> kmers occurring 1 time:  10,000
> kmers occurring 2 times:  3,000
> kmers occurring 3 times:    500
> ...

I'd recommend ignoring the first few numbers on the coverage plot if it starts 
high and drops down (i.e negative slope), and only considers
peaks once there has been a somewhat consistent increase in coverage values. 
The ImageJ peak finder does this by default, excluding peaks on
the edges of a graph if they're not surrounded by valleys:

http://imagej.nih.gov/ij/source/ij/plugin/filter/MaximumFinder.java

[see 'findMaxima' function]

I have no idea if this peak coverage problem will impact how Ray deals with the 
assembly. If it does, I'll probably need to do some
kmer-based error correction prior to assembly with Ray in the hope that it will 
bring those kmers with coverage 2-20 down to frequencies 
below the true peak value.

-- 
David Eccles
Bioinformatics Research Analyst, Gringene Bioinformatics
Room 2.10 x857
Malaghan Institute of Medical Research

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to