On 13/07/11 17:01, David Eccles (gringer) wrote: > Um... what? So... the first few workers that get added don't actually do > any work, and the activeWorkers array isn't populated until the first 5 > vertices are loaded. This agrees with what's on lines 138-144: > > int coverage=node->getCoverage(&vertexKey); > int minimum=5; > if(coverage<minimum){ > m_completedJobs++; > }else{ > m_aliveWorkers[m_SEEDING_i].constructor(&vertexKey,m_parameters, > m_outboxAllocator,m_virtualCommunicator,m_SEEDING_i); > m_activeWorkers.insert(m_SEEDING_i); > }
If I reduce that minimum down to 2 [i.e. changing that-which-should-not-be-changed], and change one of the output strings to base-space [it was previously displaying the root k-mer in colour-space], then I get a successful assembly, both for sebhtml/ray, and for gringer/ray. $ wc -l test_phiX.Contigs.fasta testSeb_phiX_5k.Contigs.fasta 90 test_phiX.Contigs.fasta 90 testSeb_phiX_5k.Contigs.fasta 180 total $ diff test_phiX.Contigs.fasta testSeb_phiX_5k.Contigs.fasta [no output] $ fasta_formatter -i ../tests/phix/phix.fasta | fastx_reverse_complement | grep $(fasta_formatter -i test_phiX.Scaffolds.fasta | grep -v '^>') > /dev/null && echo "success (match in reverse direction)" success (match in reverse direction) My code now does a correct assembly both on simulated and on real (circularised) phiX data: $ ../code/Ray -s ~/illuminadata/110517_H134_0062_A816HKABXX/Data/Intensities/BaseCalls/fastq/ignore/head100k_interleaved_noN_phiX_region1101.fasta -o ray_output/test2_phiX_circular ... Number of contigs: 1 Total length of contigs: 5385 Number of contigs >= 500 nt: 1 Total length of contigs >= 500 nt: 5385 Number of scaffolds: 1 Total length of scaffolds: 5385 Number of scaffolds >= 500 nt: 1 Total length of scaffolds >= 500: 5385 -- BLAST results (for 600bp sequence overlapping joiner) -- >gb|M14428.1|S13CG Bacteriophage S13 circular DNA, complete genome Length=5386 Score = 1068 bits (1184), Expect = 0.0 Identities = 597/600 (99%), Gaps = 0/600 (0%) Strand=Plus/Plus $ ./test_phiX.sh Checking full Ray run with phiX genome... 5000 Reads simulated... Running Ray... success (match in forward direction)! This is doing things that Ray can already do (i.e. read in base-space reads, assemble, output as base-space), but I need to make sure it can do at least that before trying other things. It's also slower than sebhtml/ray (Total: 25 seconds vs 14 seconds; 57 seconds if I include asserts and debug symbols). I presume this is because of the additional complexity for getOutgoingEdges / getLastCode (it needs to re-calculate the last base each time by iterating through the k-mer). Perhaps the k-mer should store the last base as well as the first.... -- David ------------------------------------------------------------------------------ AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on "Lean Startup Secrets Revealed." This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users