On 23/09/13 11:44 AM, Nathaniel Jue wrote:
> Sebastien,
>
> Okay, if I'm reading the output correctly, this should be the memory
> allocation in KiB (is that bits or bytes? Or Kibibytes? I'm not sure) for
> each rank (which should
>basically be each processor, right?). Reads allocated to each rank were
>32868106.
Hi Nathaniel,
The Bloom filter is a component that will accumulate most of the sequencing
errors to reduce the memory usage consumed by the graph.
The size of the Bloom filter is a linear function of the number of reads because
the number of sequencing errors is also a linear function of the number of
reads.
In your log, it says " -109661072 bytes of type RAY_MALLOC_TYPE_BLOOM_FILTER".
Therefore, there is a bug in the code that causes an integer overflow.
The number of bits for the Bloom filter is (as of Ray v2.3.0-devel):
Bits = NumberOfReads * 4 * 2 * 2 * KmerLength
In your case:
Bits = 32868106 * 4 * 2 * 2 * 31 = 16 302 580 576 bits (2 037 822 572
bytes).
So this is a bug in Ray.
But, I think that 657362120 reads (20 ranks * 32868106 reads / rank) is a lot
of reads.
You may want to scale out a little bit on this (add more cores).
I created a ticket here: https://github.com/sebhtml/ray/issues/196
However, at the moment I am busy with postdoctoral scholarship applications.
Here are 2 possible workarounds that I can offer:
1. Set the number of bits manually with -bloom-filter-bits. For example, to use
512 MiB of memory
for the Bloom filter on each MPI rank, use -bloom-filter-bits 4398046511104.
2. Use more MPI ranks (more processor cores).
If you go from 20 to 200 MPI ranks, everything will be faster etc.
Séb
>
> Rank-KiB
>
> 0-3008256
> 1-2074124
> 2-1926640
> 3-1926636
> 4-1926640
> 5-1926636
> 6-1074460
> 7-1926636
> 8-1926636
> 9-1926640
> 10-1910248
> 11-1074460
> 12-1926640
> 13-1926640
> 14-1926640
> 15-1926640
> 16-1074460
> 17-1926640
> 18-1926640
> 19-1910248
>
> Rough estimate adds up to around 40GB, right?
>
> Cheers,
> Nate
>
> On Mon, Sep 23, 2013 at 9:36 AM, Sébastien Boisvert
> <[email protected] <mailto:[email protected]>>
> wrote:
>
> On 22/09/13 10:13 PM, Nathaniel Jue wrote:
>
> Hi Sebastien,
>
> By job.log, I assume you mean the stout. I tried grepping
> "BloomFilter" in both that output and all the files created in the output
> directory. The only instances of the term BloomFilter occurred in the test
> error example I sent you earlier, which occurs right after loading all the
> reads. Do you think this is a too many reads issue or something else? If so,
> any suggestions on how to deal with that?
>
>
> (Please use the list.)
>
> I would be informative to get the number of bits that the Bloom filter is
> trying to allocate
> -- this is the information that can be found in the standard output.
>
>
> Regards,
> Nate
>
>
> On Sep 20, 2013 5:19 PM, "Sébastien Boisvert"
> <[email protected] <mailto:[email protected]>
> <mailto:[email protected]
> <mailto:[email protected]>>> wrote:
>
> On 20/09/13 05:06 PM, Nathaniel Jue wrote:
>
> Hi,
>
> I've run into a bit of an issue with Ray (v2.3.0-devel) and
> was wondering if you might be able to give me some advice/help. I keep on
> getting this error message when I try to run Ray with all of my data with the
> following command (the ellipse represents the first two lines of the error
> being repeat 29 more time for each processor or all 30 processors in total).
> There is quite a bit data in the analysis (2 runs of MiSeq data and 2 lanes
> of HiSeq):
>
> >mpiexec -n 20 Ray -k 31 -p
> /data2/reads/illumina/limulus/__GRL1397_S1_L001.left.rept.__corr.fasta
> /data2/reads/illumina/limulus/__GRL1397_S1_L001.right.rept.__corr.fasta -p
> /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__left.2.rept.corr.fasta
>
> /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__right.rept.corr.fasta
> -s /data2/reads/illumina/limulus/__GRL1397_S1_L001.up.rept.corr.__fasta -s
> /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__up.rept.corr.fasta
> -p
> /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.left.2.rept.__corr.fasta
>
> /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.right.rept.__corr.fasta
> -p
> /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.left.2.rept.__corr.fasta
>
> /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.right.rept.__corr.fasta
> -s
>
> /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.up.rept.corr.__fasta
> -s
>
> /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.up.rept.corr.__fasta
> -o limulus_ray_IlluminaOnly
>
>
> Subsequent error message:
>
> Critical exception: The system is out of memory, returned
> NULL.
> Requested -109661072 bytes of type
> RAY_MALLOC_TYPE_BLOOM_FILTER
>
>
> So you are getting this with the git version of Ray. Strange.
>
> ...
>
> ------------------------------__------------------------------__--------------
>
> mpiexec has exited due to process rank 8 with PID 22018 on
> node redqueen exiting improperly. There are two reasons this
> could occur:
>
> 1. this process did not call "init" before exiting, but
> others in
> the job did. This can cause a job to hang indefinitely while
> it waits
> for all processes to call "init". By rule, if one process
> calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling
> "finalize".
> By rule, all processes that call "init" MUST call "finalize"
> prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
>
> ------------------------------__------------------------------__--------------
>
>
> I did look in the Ray mailing list, installed the
> development version of the program and Ray Platform and found discussion on a
> patch which I tried to apply to program. When
> I did that, I get this:
>
> patching file code/VerticesExtractor/__VerticesExtractor.cpp
> patching file code/VerticesExtractor/__VerticesExtractor.h
>
> Reversed (or previously applied) patch detected! Assume -R?
> [n]
>
>
> You don't need to patch the code since the git repository
> includes these
> patches already.
>
> These patches are to be applied on Ray 2.2.0.
>
> which I have to respond "y" to in order to patch the
> program. Even after patching, though, the program still gives me these error.
> I will also add that when I tried
> an assembly with just the MiSeq data, the program was able
> to finish the assembly with the same 20 processors indicates.
>
> I am using our supercomputer to do this assembly with
> consists of 48 Intel(R) Xeon(R) X7542 CPUs @ 2.67GHz (I think each as 6
> cores, if I remember right;
> I'm not the hardware guy so not sure about that) with
> something like 500GB of RAM (again, I think).
>
> Do you have any thoughts or insight into what might be going
> on? Mpi or ray issue?
>
>
>
> The number of bytes for the Bloom filter depends on the number
> of reads, mostly.
> I think it is a bug in Ray that occurs when you have too many
> reads
> and not enough ranks.
>
> Can you search for BloomFilter in your log ?
>
> You can do that with this command:
>
> grep BloomFilter job.log
>
>
>
> Regards,
> Nate
>
>
>
>
------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users