On 13/07/11 17:01, David Eccles (gringer) wrote:
> Um... what? So... the first few workers that get added don't actually do
> any work, and the activeWorkers array isn't populated until the first 5
> vertices are loaded. This agrees with what's on lines 138-144:
>
> int coverage=node->getCoverage(&vertexKey);
> int minimum=5;
> if(coverage<minimum){
> m_completedJobs++;
> }else{
> m_aliveWorkers[m_SEEDING_i].constructor(&vertexKey,m_parameters,
> m_outboxAllocator,m_virtualCommunicator,m_SEEDING_i);
> m_activeWorkers.insert(m_SEEDING_i);
> }

If I reduce that minimum down to 2 [i.e. changing 
that-which-should-not-be-changed], and change one of the output strings 
to base-space [it was previously displaying the root k-mer in 
colour-space], then I get a successful assembly, both for sebhtml/ray, 
and for gringer/ray.

$ wc -l test_phiX.Contigs.fasta testSeb_phiX_5k.Contigs.fasta
    90 test_phiX.Contigs.fasta
    90 testSeb_phiX_5k.Contigs.fasta
   180 total
$ diff test_phiX.Contigs.fasta testSeb_phiX_5k.Contigs.fasta
[no output]
$ fasta_formatter -i ../tests/phix/phix.fasta | fastx_reverse_complement 
| grep $(fasta_formatter -i test_phiX.Scaffolds.fasta | grep -v '^>') > 
/dev/null && echo "success (match in reverse direction)"
success (match in reverse direction)

My code now does a correct assembly both on simulated and on real 
(circularised) phiX data:

$ ../code/Ray -s 
~/illuminadata/110517_H134_0062_A816HKABXX/Data/Intensities/BaseCalls/fastq/ignore/head100k_interleaved_noN_phiX_region1101.fasta
 
-o ray_output/test2_phiX_circular
...
Number of contigs: 1
Total length of contigs: 5385
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5385
Number of scaffolds: 1
Total length of scaffolds: 5385
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5385
-- BLAST results (for 600bp sequence overlapping joiner) --
 >gb|M14428.1|S13CG  Bacteriophage S13 circular DNA, complete genome
Length=5386
  Score = 1068 bits (1184),  Expect = 0.0
  Identities = 597/600 (99%), Gaps = 0/600 (0%)
  Strand=Plus/Plus

$ ./test_phiX.sh
Checking full Ray run with phiX genome... 5000 Reads simulated... 
Running Ray... success (match in forward direction)!

This is doing things that Ray can already do (i.e. read in base-space 
reads, assemble, output as base-space), but I need to make sure it can 
do at least that before trying other things.

It's also slower than sebhtml/ray (Total: 25 seconds vs 14 seconds; 57 
seconds if I include asserts and debug symbols). I presume this is 
because of the additional complexity for getOutgoingEdges / getLastCode 
(it needs to re-calculate the last base each time by iterating through 
the k-mer). Perhaps the k-mer should store the last base as well as the 
first....

-- David

------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on "Lean Startup 
Secrets Revealed." This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to