Hi Matthew,
Because Ray runs in parallel, some parts of the distributed graph are assembled more than once. The 'Computing fusions' part attempts to remove this redundancy. I checked and the extra 500 Mb is contributed by some paths that failed to merge. In turn, this failure will give a hard time to the Ray scaffolder (these will not scaffold). I made some change to Ray to correct this. Hope that helps. > > > Sébastien > ________________________________________ > De : Matthew MacManes [macma...@gmail.com] > Date d'envoi : 9 août 2011 18:06 > À : Sébastien Boisvert > Cc : denovoassembler-users@lists.sourceforge.net > Objet : Re: [Denovoassembler-users] Updates on Assemblathon 2 > > Hey Seb, > > Any update on the parrot assembly? What is N50? I notice that you're > reconstructing ~1.7Gb- about .5Gb too much. What do you make if that? > > Matt > > On Jul 13, 2011, at 16:45, Sébastien Boisvert > <sebastien.boisver...@ulaval.ca> wrote: > >> Dear list, >> >> For those who follow Assemblathon 2, my last run on my testbed (Illumina >> data from BGI and from Illumina UK for the Bird/Parrot): >> >> (all mate-pairs failed detection because of the many peaks in each library, >> I will modify Ray to consider that) >> >> >> Total number of unfiltered Illumina TruSeq v3 sequences: Total: 3 072 136 >> 294, that is ~3 G sequences ! >> >> >> 512 compute cores (64 computers * 8 cores/computer = 512) >> >> Typical communication profile for one compute core: >> >> [1,0]<stdout>:Rank 0: sent 249841326 messages, received 249840303 messages. >> >> Yes, each core sends an average of 250 M messages during the 18 hours ! >> >> >> >> Peak memory usage per core: 2.2 GiB >> >> Peak memory usage (distributed in a peer-to-peer fashion): 1100 GiB >> >> The peak occurs around 3 hours and goes down to 1.1 GiB per node immediately >> because the pool of defragmentation groups for k-mers occuring once is freed. >> >> >> The compute cluster I use has 3 GiB per compute core. So using 2048 compute >> cores would give me 6144 GiB of distributed memory. >> >> >> >> Number of contigs: 550764 >> Total length of contigs: 1672750795 >> Number of contigs >= 500 nt: 501312 >> Total length of contigs >= 500 nt: 1656776315 >> Number of scaffolds: 510607 >> Total length of scaffolds: 1681345451 >> Number of scaffolds >= 500 nt: 463741 >> Total length of scaffolds >= 500: 1666464367 >> >> k-mer length: 31 >> Lowest coverage observed: 1 >> MinimumCoverage: 42 >> PeakCoverage: 171 >> RepeatCoverage: 300 >> Number of k-mers with at least MinimumCoverage: 2453479388 k-mers >> Estimated genome length: 1226739694 nucleotides >> Percentage of vertices with coverage 1: 83.7771 % >> DistributionFile: parrot-Testbed-A2-k31-20110712.CoverageDistribution.txt >> >> [1,0]<stdout>: Sequence partitioning: 1 hours, 54 minutes, 47 seconds >> [1,0]<stdout>: K-mer counting: 5 hours, 47 minutes, 20 seconds >> [1,0]<stdout>: Coverage distribution analysis: 30 seconds >> [1,0]<stdout>: Graph construction: 2 hours, 52 minutes, 27 seconds >> [1,0]<stdout>: Edge purge: 57 minutes, 55 seconds >> [1,0]<stdout>: Selection of optimal read markers: 1 hours, 38 minutes, 13 >> seconds >> [1,0]<stdout>: Detection of assembly seeds: 16 minutes, 7 seconds >> [1,0]<stdout>: Estimation of outer distances for paired reads: 6 minutes, 26 >> seconds >> [1,0]<stdout>: Bidirectional extension of seeds: 3 hours, 18 minutes, 6 >> seconds >> [1,0]<stdout>: Merging of redundant contigs: 15 minutes, 45 seconds >> [1,0]<stdout>: Generation of contigs: 1 minutes, 41 seconds >> [1,0]<stdout>: Scaffolding of contigs: 54 minutes, 3 seconds >> [1,0]<stdout>: Total: 18 hours, 3 minutes, 50 seconds >> >> >> 10 largest scaffolds: >> >> 257646 >> 266905 >> 268737 >> 272828 >> 281502 >> 294105 >> 294106 >> 296978 >> 333171 >> 397201 >> >> >> # average latency in microseconds (10^-6 seconds) when requesting a reply >> for a message of 4000 bytes >> # Message passing interface rank Name Latency in microseconds >> 0 r107-n24 138 >> 1 r107-n24 140 >> 2 r107-n24 140 >> 3 r107-n24 140 >> 4 r107-n24 141 >> 5 r107-n24 141 >> 6 r107-n24 140 >> 7 r107-n24 140 >> 8 r107-n25 140 >> 9 r107-n25 139 >> 10 r107-n25 138 >> 11 r107-n25 139 >> >> >> >> >> >> >> Sébastien >> ------------------------------------------------------------------------------ >> AppSumo Presents a FREE Video for the SourceForge Community by Eric >> Ries, the creator of the Lean Startup Methodology on "Lean Startup >> Secrets Revealed." This video shows you how to validate your ideas, >> optimize your ideas and identify your business strategy. >> http://p.sf.net/sfu/appsumosfdev2dev >> _______________________________________________ >> Denovoassembler-users mailing list >> Denovoassembler-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users > Sébastien Boisvert http://github.com/sebhtml/ray ------------------------------------------------------------------------------ uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users