Hi! Peter Huang a écrit : > Hi Ray group, > > I am using Ray to assembly our sequencing data. As some of the reads > mis-assembled onto our final scaffolds and > we have many low coverage contigs hanging around, I am curious if there is a > flag to eliminate the contigs > with low coverage such as 5 or 10. ( I know Ray has a flag to set the limit > length of contig).
I just added this option: -use-minimum-seed-coverage minimumSeedCoverageDepth Sets the minimum seed coverage depth. Any path with a coverage depth lower than this will be discarded. The default is 0. Example: -use-minimum-seed-coverage 40 You will need to install Ray (and RayPlatform) from the git repository. The changes: MANUAL_PAGE.txt | 4 ++++ code/application_core/Parameters.cpp | 5 ++++- code/plugin_SeedingData/SeedingData.cpp | 13 ++++++++++++- code/plugin_SeedingData/SeedingData.h | 6 ++++++ 4 files changed, 26 insertions(+), 2 deletions(-) This option will be available also in Ray v2.1.0 which will be shipped around mid September 2012. Also, Ray creates a file containing meta data for each contig, you can use the column 'Mode k-mer coverage depth': [boiseb01@ls30 RayKmerSearchDevel]$ head TestX/BiologicalAbundances/_DeNovoAssembly/Contigs.tsv #Contig name K-mer length Length in k-mers Colored k-mers Proportion Mode k-mer coverage depth K-mer observations Total Proportion contig-0 21 9859 0 0 30 295770 60497522 0.00488896 contig-15 21 6874 0 0 28 192472 60497522 0.00318149 contig-16 21 3353 0 0 31 103943 60497522 0.00171814 contig-14 21 8809 0 0 32 281888 60497522 0.0046595 contig-1000000 21 139 0 0 88 12232 60497522 0.00020219 contig-1000015 21 558 0 0 58 32364 60497522 0.000534964 contig-3 21 9297 0 0 29 269613 60497522 0.0044566 contig-7 21 9644 0 0 30 289320 60497522 0.00478234 contig-27 21 12701 0 0 30 381030 60497522 0.00629827 >In addition, is there a cut off for kmer as well, so that low kmer coverage >will be eliminated at early stage of assembly? > This: -use-minimum-seed-coverage minimumSeedCoverageDepth Sets the minimum seed coverage depth. Any path with a coverage depth lower than this will be discarded. The default is 0. (not available in v2.0.0, see above). There is also the following option that discard things that have too much coverage: -use-maximum-seed-coverage Ignores any seed with a coverage depth above this threshold.ééé The default is 4294967295. If the problem is with memory usage caused by erroneous k-mers, you can increase the number of bits in the Bloom filter: -bloom-filter-bits bits Sets the number of bits for the Bloom filter Default is 268435456 bits, 0 bits disables the Bloom filter. This option was added recently, you will need to install from the git repository. There are other useful new options for tuning the distributed in-memory storage engine, see MANUAL_PAGE.txt Sébastien Boisvert > Thanks. > > Best, > > Peter ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users