[Denovoassembler-users] RE : Confused about coding -- completed seeds without distributions

Sébastien Boisvert Wed, 13 Jul 2011 08:50:56 -0700

Well, as I understand, you did not respect one simple rule of version control:


* Never commit broken code.

To put it simply: if you commit broken code, nobody will pull from you !


It becomes very difficult to track which commit introduced a regression. (at 
some point it becomes impossible).

You should see a code repository as a stash of things that are tested and 
mostly bug-free.

Then, finding a bug with 'git bisect' is very fast.


Basically, you must search your own commits to find where you screwed. But you 
can not do that because
you are committing broken code to your fork.


As Linus Torvalds would say: commit early, commit often.


Furthermore, I am not sure that all the emails about you debugging the bugs you 
introduced in your forks are of general
interest for denovoassembler-users.

I therefore created a development mailing list (should be up in maximum 24 
hours).

https://lists.sourceforge.net/lists/listinfo/denovoassembler-devel


Seeds are basically paths in the k-mer graph. So, if your fork can not compute 
any seeds, then it is probably a problem 
in the connectivity of vertices.

Can you post the content of RayOutput.degreeDistribution.txt to 
denovoassembler-devel when it is online in a few hours ?






                                                     Sébastien

> ________________________________________
> De : David Eccles (gringer) [david.ecc...@mpi-muenster.mpg.de]
> Date d'envoi : 12 juillet 2011 10:49
> À : denovoassembler-users@lists.sourceforge.net
> Objet : [Denovoassembler-users] Confused about coding -- completed seeds      
>   without distributions
> 
> Well, I'm now a bit stuck on my colour-space Kmers/Reads modifications,
> sorry. I don't think I can progress further on this until I understand
> more about the run sequence so that I can work out where it's going
> wrong. I'm currently just trying to get the program to assemble / output
> in base-space (although the hash ID is calculated from the colour-space
> sequence), but it's not working for that.
> 
> Here's what I think are the important bits of the Ray output:
> 
> $ ../code/Ray --debug-seeds -s phix_1.fasta | grep -e complete -e workers
> Rank 0 has 1000 sequence reads (completed)
> Rank 0 is counting k-mers in sequence reads [1000/1000] (completed)
> Rank 0 has 10434 k-mers (completed)
> Rank 0 is computing vertices & edges [1000/1000] (completed)
> Rank 0 has 10200 vertices (completed)
> Rank 0 is purging edges [10200/10200] (completed)
> Rank 0: peak number of workers: 500, maximum: 30000
> Rank 0 is selecting optimal read markers [1000/1000] (completed)
> Rank 0: peak number of workers: 500, maximum: 30000
> Rank 0 is creating seeds [10200/10200] (completed)
> Rank 0: peak number of workers: 500, maximum: 30000
> Rank 0 is calculating library lengths [0/0] (completed)
> Rank 0: peak number of workers: 0, maximum: 30000
> Rank 0 is extending seeds [0/0] (completed)
> Rank 0 is computing fusions [0/0] (completed)
> Rank 0 is distributing fusions [0/0] (completed)
> Rank 0 is finishing fusions [0/0] (completed)
> Rank 0 is distributing fusions [0/0] (completed)
> Rank 0 is computing fusions [0/0] (completed)
> Rank 0 is distributing fusions [0/0] (completed)
> Rank 0 is finishing fusions [0/0] (completed)
> Rank 0 is distributing fusions [0/0] (completed)
> 
> Output file list:
> 0 Jul 12 16:40 RayOutput.ContigLengths.txt
> 0 Jul 12 16:40 RayOutput.Contigs.fasta
> 305 Jul 12 16:40 RayOutput.CoverageDistributionAnalysis.txt
> 100 Jul 12 16:40 RayOutput.CoverageDistribution.txt
> 578 Jul 12 16:40 RayOutput.degreeDistribution.txt
> 88 Jul 12 16:40 RayOutput.LibraryStatistics.txt
> 5.5K Jul 12 16:40 RayOutput.MessagePassingInterface.txt
> 177 Jul 12 16:40 RayOutput.NetworkTest.txt
> 237 Jul 12 16:40 RayOutput.OutputNumbers.txt
> 67 Jul 12 16:40 RayOutput.RayCommand.txt
> 25 Jul 12 16:40 RayOutput.RayVersion.txt
> 0 Jul 12 16:40 RayOutput.ScaffoldComponents.txt
> 0 Jul 12 16:40 RayOutput.ScaffoldLengths.txt
> 0 Jul 12 16:40 RayOutput.ScaffoldLinks.txt
> 0 Jul 12 16:40 RayOutput.Scaffolds.fasta
> 0 Jul 12 16:40 RayOutput.SeedLengthDistribution.txt
> 
> It's produced reasonable looking numbers in the Coverage distribution
> file (and coverage analysis file) after I changed the smoothing function
> to be a bit more like a smoothing function
> (https://github.com/gringer/ray/commit/f048203e571d4c7a267dace27fec063cfe2059dd,
> ignore the 'push_back(0)' bits).
> 
> However, the code I have doesn't seem to have any success in finding seeds.
> 
> Oh, and FWIW, all the unit tests are passing [except for my new Ray
> assembly run system test].
> 
> Any ideas on where I should look?
> 
> Thanks,
> 
> David Eccles (gringer)
> 
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> Denovoassembler-users mailing list
> Denovoassembler-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
> 
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on "Lean Startup 
Secrets Revealed." This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

[Denovoassembler-users] RE : Confused about coding -- completed seeds without distributions

Reply via email to