On 23/09/13 11:44 AM, Nathaniel Jue wrote:
> Sebastien,
>
> Okay, if I'm reading the output correctly, this should be the memory 
> allocation in KiB (is that bits or bytes? Or Kibibytes? I'm not sure) for 
> each rank (which should
>basically be each processor, right?). Reads allocated to each rank were 
>32868106.

Hi Nathaniel,

The Bloom filter is a component that will accumulate most of the sequencing
errors to reduce the memory usage consumed by the graph.

The size of the Bloom filter is a linear function of the number of reads because
the number of sequencing errors is also a linear function of the number of 
reads.


In your log, it says " -109661072 bytes of type RAY_MALLOC_TYPE_BLOOM_FILTER".

Therefore, there is a bug in the code that causes an integer overflow.


The number of bits for the Bloom filter is (as of Ray v2.3.0-devel):

    Bits = NumberOfReads * 4 * 2 * 2 * KmerLength

In your case:

    Bits = 32868106 * 4 * 2 * 2 * 31 = 16 302 580 576 bits (2 037 822 572 
bytes).


So this is a bug in Ray.


But, I think that 657362120 reads (20 ranks * 32868106 reads / rank) is a lot 
of reads.
You may want to scale out a little bit on this (add more cores).


I created a ticket here: https://github.com/sebhtml/ray/issues/196


However, at the moment I am busy with postdoctoral scholarship applications.


Here are 2 possible workarounds that I can offer:

1. Set the number of bits manually with -bloom-filter-bits. For example, to use 
512 MiB of memory
for the Bloom filter on each MPI rank, use -bloom-filter-bits 4398046511104.

2. Use more MPI ranks (more processor cores).

If you go from 20 to 200 MPI ranks, everything will be faster etc.


  
               Séb

>
> Rank-KiB
>
> 0-3008256
> 1-2074124
> 2-1926640
> 3-1926636
> 4-1926640
> 5-1926636
> 6-1074460
> 7-1926636
> 8-1926636
> 9-1926640
> 10-1910248
> 11-1074460
> 12-1926640
> 13-1926640
> 14-1926640
> 15-1926640
> 16-1074460
> 17-1926640
> 18-1926640
> 19-1910248
>
> Rough estimate adds up to around 40GB, right?
>
> Cheers,
> Nate
>
> On Mon, Sep 23, 2013 at 9:36 AM, Sébastien Boisvert 
> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca>> 
> wrote:
>
>     On 22/09/13 10:13 PM, Nathaniel Jue wrote:
>
>         Hi Sebastien,
>
>         By job.log, I assume you mean the stout. I tried grepping 
> "BloomFilter" in both that output and all the files created in the output 
> directory. The only instances of the term BloomFilter occurred in the test 
> error example I sent you earlier, which occurs right after loading all the 
> reads.  Do you think this is a too many reads issue or something else? If so, 
> any suggestions on how to deal with that?
>
>
>     (Please use the list.)
>
>     I would be informative to get the number of bits that the Bloom filter is 
> trying to allocate
>     -- this is the information that can be found in the standard output.
>
>
>         Regards,
>         Nate
>
>
>         On Sep 20, 2013 5:19 PM, "Sébastien Boisvert" 
> <sebastien.boisver...@ulaval.ca <mailto:sebastien.boisver...@ulaval.ca> 
> <mailto:sebastien.boisver...@ulaval.ca 
> <mailto:sebastien.boisver...@ulaval.ca>>> wrote:
>
>              On 20/09/13 05:06 PM, Nathaniel Jue wrote:
>
>                  Hi,
>
>                  I've run into a bit of an issue with Ray (v2.3.0-devel) and 
> was wondering if you might be able to give me some advice/help. I keep on 
> getting this error message when I try to run Ray with all of my data with the 
> following command (the ellipse represents the first two lines of the error 
> being repeat 29 more time for each processor or all 30 processors in total). 
> There is quite a bit data in the analysis (2 runs of MiSeq data and 2 lanes 
> of HiSeq):
>
>                    >mpiexec -n 20 Ray -k 31 -p 
> /data2/reads/illumina/limulus/__GRL1397_S1_L001.left.rept.__corr.fasta 
> /data2/reads/illumina/limulus/__GRL1397_S1_L001.right.rept.__corr.fasta -p 
> /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__left.2.rept.corr.fasta
>  
> /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__right.rept.corr.fasta
>  -s /data2/reads/illumina/limulus/__GRL1397_S1_L001.up.rept.corr.__fasta -s 
> /data2/reads/illumina/limulus/__GRL1402_errorCorrect/GRL1402.__up.rept.corr.fasta
>  -p 
> /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.left.2.rept.__corr.fasta
>  
> /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.right.rept.__corr.fasta
>  -p 
> /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.left.2.rept.__corr.fasta
>  
> /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.right.rept.__corr.fasta
>  -s
>                  
> /data2/reads/illumina/limulus/__8871_CGATGT_L003_errorCorrect/__8871_CGATGT_L003.up.rept.corr.__fasta
>  -s
>                  
> /data2/reads/illumina/limulus/__8871_CGATGT_L004_errorCorrect/__8871_CGATGT_L004.up.rept.corr.__fasta
>  -o limulus_ray_IlluminaOnly
>
>
>                  Subsequent error message:
>
>                  Critical exception: The system is out of memory, returned 
> NULL.
>                  Requested -109661072 bytes of type 
> RAY_MALLOC_TYPE_BLOOM_FILTER
>
>
>              So you are getting this with the git version of Ray. Strange.
>
>                  ...
>                  
> ------------------------------__------------------------------__--------------
>
>                  mpiexec has exited due to process rank 8 with PID 22018 on
>                  node redqueen exiting improperly. There are two reasons this 
> could occur:
>
>                  1. this process did not call "init" before exiting, but 
> others in
>                  the job did. This can cause a job to hang indefinitely while 
> it waits
>                  for all processes to call "init". By rule, if one process 
> calls "init",
>                  then ALL processes must call "init" prior to termination.
>
>                  2. this process called "init", but exited without calling 
> "finalize".
>                  By rule, all processes that call "init" MUST call "finalize" 
> prior to
>                  exiting or it will be considered an "abnormal termination"
>
>                  This may have caused other processes in the application to be
>                  terminated by signals sent by mpiexec (as reported here).
>                  
> ------------------------------__------------------------------__--------------
>
>
>                  I did look in the Ray mailing list, installed the 
> development version of the program and Ray Platform and found discussion on a 
> patch which I tried to apply to program. When
>                  I did that, I get this:
>
>                  patching file code/VerticesExtractor/__VerticesExtractor.cpp
>                  patching file code/VerticesExtractor/__VerticesExtractor.h
>
>                  Reversed (or previously applied) patch detected!  Assume -R? 
> [n]
>
>
>              You don't need to patch the code since the git repository 
> includes these
>              patches already.
>
>              These patches are to be applied on Ray 2.2.0.
>
>                  which I have to respond "y" to in order to patch the 
> program. Even after patching, though, the program still gives me these error. 
> I will also add that when I tried
>                  an assembly with just the MiSeq data, the program was able 
> to finish the assembly with the same 20 processors indicates.
>
>                  I am using our supercomputer to do this assembly with 
> consists of 48 Intel(R) Xeon(R) X7542  CPUs @ 2.67GHz (I think each as 6 
> cores, if I remember right;
>                  I'm not the hardware guy so not sure about that) with 
> something like 500GB of RAM (again, I think).
>
>                  Do you have any thoughts or insight into what might be going 
> on? Mpi or ray issue?
>
>
>
>              The number of bytes for the Bloom filter depends on the number 
> of reads, mostly.
>              I think it is a bug in Ray that occurs when you have too many 
> reads
>              and not enough ranks.
>
>              Can you search for BloomFilter in your log ?
>
>              You can do that with this command:
>
>              grep BloomFilter job.log
>
>
>
>                  Regards,
>                  Nate
>
>
>
>


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to