Hello, I found why larger-than-32 k-mers were slowish to compute by Ray 1.6.0.
vertexRank [core/common_functions.cpp] takes a Kmer and maps it to an MPI rank. For k-mers longer than 32, at least 2 uint64_t (64-bit integers) are needed and the hash for these longer k-mers was buggy because it was computed only on the first uint64_t -- thus breaking the uniformity of vertex distribution across MPI ranks. This was fixed in https://github.com/sebhtml/ray/commit/e1309521f7c2e So, yes, k-mers longer than 32 are not uniformly distributed on MPI ranks and this leads to slow assemblies because balanced communication between MPI ranks is not maintained. This is now fixed. Sébastien http://twitter.com/sebhtml On Thu, 2011-06-16 at 07:39 -0400, Adrian Platts, Mr wrote: > Hi Sebastien, > > I'm replying to you directly rather than to the list as I get rather annoyed > with lists where every minutiae of the debugging process goes to everyone > on the list. > > > You said it was still in 'computing vertices and edges'. > > Do you get updates or it looks hanged ? > > Yep, after 16 hours it moved on to the next stage, it was going very slowly > (not frozen) so i killed it in order to adjust maxkmer down from 128 to 64. > > OK, that must be something elsewhere. > > Can you try with MAXKMERLENGTH=64 ? > > I recompiled with 64 and reran. It was definitely a bit faster, finishing > the computing vertices and edges in about 8 hours > (with the short kmers the whole assembly normally completes in under 3 hours > - see output below). > > I left it running overnight and checked this morning. The processes had > perhaps partially completed but only a few output files had been > generated. It seems to have halted abnormally at some point, perhaps while > computing contigs? > > [aplatts@grandiflora Ray-1.6.0]$ ls -alt > total 472 > -rw-rw-r-- 1 aplatts aplatts 1523 Jun 15 19:17 > SI_61_MP28065MPSI_Scaff.LibraryStatistics.txt > drwxr-xr-x 8 aplatts aplatts 4096 Jun 15 19:17 . > -rw-rw-r-- 1 aplatts aplatts 42080 Jun 15 19:07 > SI_61_MP28065MPSI_Scaff.SeedLengthDistribution.txt > -rw-rw-r-- 1 aplatts aplatts 329 Jun 15 16:10 > SI_61_MP28065MPSI_Scaff.CoverageDistributionAnalysis.txt > -rw-rw-r-- 1 aplatts aplatts 110604 Jun 15 16:10 > SI_61_MP28065MPSI_Scaff.CoverageDistribution.txt > -rw-rw-r-- 1 aplatts aplatts 517 Jun 15 11:32 > SI_61_MP28065MPSI_Scaff.RayCommand.txt > -rw-rw-r-- 1 aplatts aplatts 19 Jun 15 11:32 > SI_61_MP28065MPSI_Scaff.RayVersion.txt > -rw-rw-r-- 1 aplatts aplatts 9 Jun 15 11:31 TARGETS > -rw-rw-r-- 1 aplatts aplatts 17 Jun 15 11:31 PREFIX > drwxr-xr-x 12 aplatts aplatts 4096 Jun 15 11:31 code > -rw-rw-r-- 1 aplatts aplatts 0 Jun 15 11:30 showOptions > drwxrwxr-x 2 aplatts aplatts 99 Jun 14 13:22 Ray-Large-k-mers > drwxrwxrwx 41 aplatts aplatts 4096 Jun 14 13:10 .. > > Sorry, I wasn't collecting the output during this run - I guess I should > rerun it? I have the exact same run with k=31 (1.4.0) where > there weren't any problems: > > -rw-rw-r-- 1 aplatts aplatts 223547921 May 31 10:21 > SI_31_MP28065MPSI_Scaff.Scaffolds.fasta > -rw-rw-r-- 1 aplatts aplatts 285 May 31 10:20 > SI_31_MP28065MPSI_Scaff.OutputNumbers.txt > -rw-rw-r-- 1 aplatts aplatts 1877068 May 31 10:20 > SI_31_MP28065MPSI_Scaff.ScaffoldComponents.txt > -rw-rw-r-- 1 aplatts aplatts 625703 May 31 10:20 > SI_31_MP28065MPSI_Scaff.ScaffoldLengths.txt > -rw-rw-r-- 1 aplatts aplatts 914872 May 31 10:20 > SI_31_MP28065MPSI_Scaff.ContigLengths.txt > -rw-rw-r-- 1 aplatts aplatts 511876 May 31 10:20 > SI_31_MP28065MPSI_Scaff.ScaffoldLinks.txt > -rw-rw-r-- 1 aplatts aplatts 220073016 May 31 10:15 > SI_31_MP28065MPSI_Scaff.Contigs.fasta > -rw-rw-r-- 1 aplatts aplatts 1523 May 31 09:45 > SI_31_MP28065MPSI_Scaff.LibraryStatistics.txt > -rw-rw-r-- 1 aplatts aplatts 54734 May 31 09:42 > SI_31_MP28065MPSI_Scaff.SeedLengthDistribution.txt > -rw-rw-r-- 1 aplatts aplatts 171 May 31 08:46 > SI_31_MP28065MPSI_Scaff.CoverageDistributionAnalysis.txt > -rw-rw-r-- 1 aplatts aplatts 189495 May 31 08:46 > SI_31_MP28065MPSI_Scaff.CoverageDistribution.txt > -rw-rw-r-- 1 aplatts aplatts 500 May 31 08:02 > SI_31_MP28065MPSI_Scaff.RayCommand.txt > -rw-rw-r-- 1 aplatts aplatts 19 May 31 08:02 > SI_31_MP28065MPSI_Scaff.RayVersion.txt > > I couldn't see many differences in the output files, the percentage of > vertices with coverage 1 was very slightly > higher in the k=61 run (32% v 30% - these both seem high?) but not by much. > > I guess it could have run out of memory but 256GB was available and when I > left it only about 32 GB was being used by > Ray and I don't see anything in the kernel messages about alloc failures or > OOM killer activity. > > Adrian > > > > ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
