On 23/09/13 11:44 AM, Nathaniel Jue wrote: > Sebastien, > > Okay, if I'm reading the output correctly, this should be the memory > allocation in KiB (is that bits or bytes? Or Kibibytes? I'm not sure) for > each rank (which should >basically be each processor, right?). Reads allocated to each rank were >32868106.
Hi Nathaniel, The Bloom filter is a component that will accumulate most of the sequencing errors to reduce the memory usage consumed by the graph. The size of the Bloom filter is a linear function of the number of reads because the number of sequencing errors is also a linear function of the number of reads. In your log, it says " -109661072 bytes of type RAY_MALLOC_TYPE_BLOOM_FILTER". Therefore, there is a bug in the code that causes an integer overflow. The number of bits for the Bloom filter is (as of Ray v2.3.0-devel): Bits = NumberOfReads * 4 * 2 * 2 * KmerLength In your case: Bits = 32868106 * 4 * 2 * 2 * 31 = 16 302 580 576 bits (2 037 822 572 bytes). So this is a bug in Ray. But, I think that 657362120 reads (20 ranks * 32868106 reads / rank) is a lot of reads. You may want to scale out a little bit on this (add more cores). I created a ticket here: https://github.com/sebhtml/ray/issues/196 However, at the moment I am busy with postdoctoral scholarship applications. Here are 2 possible workarounds that I can offer: 1. Set the number of bits manually with -bloom-filter-bits. For example, to use 512 MiB of memory for the Bloom filter on each MPI rank, use -bloom-filter-bits 4398046511104. 2. Use more MPI ranks (more processor cores). If you go from 20 to 200 MPI ranks, everything will be faster etc. Séb > > Rank-KiB > > 0-3008256 > 1-2074124 > 2-1926640 > 3-1926636 > 4-1926640 > 5-1926636 > 6-1074460 > 7-1926636 > 8-1926636 > 9-1926640 > 10-1910248 > 11-1074460 > 12-1926640 > 13-1926640 > 14-1926640 > 15-1926640 > 16-1074460 > 17-1926640 > 18-1926640 > 19-1910248 > > Rough estimate adds up to around 40GB, right? > > Cheers, > Nate > > On Mon, Sep 23, 2013 at 9:36 AM, Sébastien Boisvert > <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca>> > wrote: > > On 22/09/13 10:13 PM, Nathaniel Jue wrote: > > Hi Sebastien, > > By job.log, I assume you mean the stout. I tried grepping > "BloomFilter" in both that output and all the files created in the output > directory. The only instances of the term BloomFilter occurred in the test > error example I sent you earlier, which occurs right after loading all the > reads. Do you think this is a too many reads issue or something else? If so, > any suggestions on how to deal with that? > > > (Please use the list.) > > I would be informative to get the number of bits that the Bloom filter is > trying to allocate > -- this is the information that can be found in the standard output. > > > Regards, > Nate > > > On Sep 20, 2013 5:19 PM, "Sébastien Boisvert" > <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca> > <mailto:sebastien.boisver...@ulaval.ca > <mailto:sebastien.boisver...@ulaval.ca>>> wrote: > > On 20/09/13 05:06 PM, Nathaniel Jue wrote: > > Hi, > > I've run into a bit of an issue with Ray (v2.3.0-devel) and > was wondering if you might be able to give me some advice/help. I keep on > getting this error message when I try to run Ray with all of my data with the > following command (the ellipse represents the first two lines of the error > being repeat 29 more time for each processor or all 30 processors in total). > There is quite a bit data in the analysis (2 runs of MiSeq data and 2 lanes > of HiSeq): > > >mpiexec -n 20 Ray -k 31 -p > /data2/reads/illumina/limulus/__GRL1397_S1_L001.left.rept.__corr.fasta > /data2/reads/illumina/limulus/__GRL1397_S1_L001.right.rept.__corr.fasta -p > /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__left.2.rept.corr.fasta > > /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__right.rept.corr.fasta > -s /data2/reads/illumina/limulus/__GRL1397_S1_L001.up.rept.corr.__fasta -s > /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__up.rept.corr.fasta > -p > /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.left.2.rept.__corr.fasta > > /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.right.rept.__corr.fasta > -p > /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.left.2.rept.__corr.fasta > > /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.right.rept.__corr.fasta > -s > > /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.up.rept.corr.__fasta > -s > > /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.up.rept.corr.__fasta > -o limulus_ray_IlluminaOnly > > > Subsequent error message: > > Critical exception: The system is out of memory, returned > NULL. > Requested -109661072 bytes of type > RAY_MALLOC_TYPE_BLOOM_FILTER > > > So you are getting this with the git version of Ray. Strange. > > ... > > ------------------------------__------------------------------__-------------- > > mpiexec has exited due to process rank 8 with PID 22018 on > node redqueen exiting improperly. There are two reasons this > could occur: > > 1. this process did not call "init" before exiting, but > others in > the job did. This can cause a job to hang indefinitely while > it waits > for all processes to call "init". By rule, if one process > calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling > "finalize". > By rule, all processes that call "init" MUST call "finalize" > prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpiexec (as reported here). > > ------------------------------__------------------------------__-------------- > > > I did look in the Ray mailing list, installed the > development version of the program and Ray Platform and found discussion on a > patch which I tried to apply to program. When > I did that, I get this: > > patching file code/VerticesExtractor/__VerticesExtractor.cpp > patching file code/VerticesExtractor/__VerticesExtractor.h > > Reversed (or previously applied) patch detected! Assume -R? > [n] > > > You don't need to patch the code since the git repository > includes these > patches already. > > These patches are to be applied on Ray 2.2.0. > > which I have to respond "y" to in order to patch the > program. Even after patching, though, the program still gives me these error. > I will also add that when I tried > an assembly with just the MiSeq data, the program was able > to finish the assembly with the same 20 processors indicates. > > I am using our supercomputer to do this assembly with > consists of 48 Intel(R) Xeon(R) X7542 CPUs @ 2.67GHz (I think each as 6 > cores, if I remember right; > I'm not the hardware guy so not sure about that) with > something like 500GB of RAM (again, I think). > > Do you have any thoughts or insight into what might be going > on? Mpi or ray issue? > > > > The number of bytes for the Bloom filter depends on the number > of reads, mostly. > I think it is a bug in Ray that occurs when you have too many > reads > and not enough ranks. > > Can you search for BloomFilter in your log ? > > You can do that with this command: > > grep BloomFilter job.log > > > > Regards, > Nate > > > > ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users