Unfortunately for right now shutting down the processes at each iteration 
will probably be the simplest fix.  The underlying issue might 
be https://github.com/JuliaLang/julia/issues/6597 so splitting up the work 
yourself would not help there.  Whatever you try, it would be good to 
highlight the workaround in the linked issue if it helps to narrow down the 
problem.

Best,
Jake

On Monday, March 16, 2015 at 4:03:34 PM UTC-4, Deniz Yuret wrote:
>
> Is there a suggested workaround?  e.g. Split the array yourself instead of 
> using DArrays, or shut down the workers and restart them every iteration 
> etc.?
>
> best,
> deniz
>
>
> On Mon, Mar 16, 2015 at 9:39 PM, Jake Bolewski <[email protected] 
> <javascript:>> wrote:
>
>> I suspect you are running into 
>> https://github.com/JuliaLang/julia/issues/8912.
>>
>> Best,
>> Jake
>>
>>
>> On Monday, March 16, 2015 at 2:53:25 PM UTC-4, Deniz Yuret wrote:
>>
>>> I am stuck trying to debug a memory leak issue.  What is the best way to 
>>> find out what gc is doing?
>>>
>>> My program generates and processes 10GB data every iteration.  I set the 
>>> data variables to "nothing" and explicitly call gc() every time to make 
>>> sure the space is reclaimed. However the memory usage keeps growing and 
>>> ends up crashing the machine (unfortunately several hours into the run).   
>>> The table below is the output of 'ps aux' at every iteration. The growth is 
>>> irregular as can be seen from the RSS column below.   If I wasn't cleaning 
>>> up properly I would expect a more regular growth every iteration.  The 
>>> changes in RSS seem to be around 10GB, so I suspect Julia is failing to 
>>> reclaim the memory from previous iterations sometimes.  The program is also 
>>> multithreaded, it uses pmap to process data on multiple cores, I also 
>>> suspect gc() may have an issue with multiple threads.  The Julia version is 
>>> v0.3.5.  Any pointers would be appreciated.
>>>
>>> best,
>>> deniz
>>>
>>>
>>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>> dyuret   16245 79.7  9.1 176446240 12077528 ?  R<Ll 20:41   4:49 julia3 
>>> gtrain.jl
>>> dyuret   16245 74.8 17.2 187169612 22796900 ?  S<Ll 20:41   9:10 julia3 
>>> gtrain.jl
>>> dyuret   16245 73.4 25.0 197398544 33027964 ?  S<Ll 20:41  13:30 julia3 
>>> gtrain.jl
>>> dyuret   16245 72.8 17.7 187786336 23415824 ?  S<Ll 20:41  17:52 julia3 
>>> gtrain.jl
>>> dyuret   16245 72.4 17.6 187726364 23358316 ?  S<Ll 20:41  22:13 julia3 
>>> gtrain.jl
>>> dyuret   16245 72.0 25.0 197479144 33111112 ?  S<Ll 20:41  26:34 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.8 25.1 197594204 33222044 ?  S<Ll 20:41  30:55 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.6 32.3 207126888 42755400 ?  S<Ll 20:41  35:16 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.5 32.1 206849040 42472956 ?  S<Ll 20:41  39:37 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.4 25.5 198175184 33806360 ?  S<Ll 20:41  43:58 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.3 25.5 198175184 33806740 ?  S<Ll 20:41  48:20 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.2 25.6 198179280 33811084 ?  S<Ll 20:41  52:41 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.2 33.0 208054076 43685884 ?  S<Ll 20:41  57:02 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.2 25.8 198521392 34153384 ?  S<Ll 20:41  61:24 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.1 32.6 207497648 43129636 ?  S<Ll 20:41  65:44 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.1 32.1 206831844 42463844 ?  S<Ll 20:41  70:06 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.0 39.8 217030332 52662460 ?  S<Ll 20:41  74:27 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.0 32.6 207497648 43129780 ?  S<Ll 20:41  78:48 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.0 33.4 208505564 44137820 ?  S<Ll 20:41  83:10 julia3 
>>> gtrain.jl
>>> dyuret   16245 71.0 32.6 207497648 43129904 ?  S<Ll 20:41  87:32 julia3 
>>> gtrain.jl
>>> dyuret   16245 70.9 33.0 208059856 43692112 ?  S<Ll 20:41  91:53 julia3 
>>> gtrain.jl
>>> dyuret   16245 70.9 40.8 218361940 53994196 ?  S<Ll 20:41  96:14 julia3 
>>> gtrain.jl
>>> dyuret   16245 70.9 47.5 227228820 62861076 ?  S<Ll 20:41 100:35 julia3 
>>> gtrain.jl
>>> dyuret   16245 70.9 47.5 227228820 62861076 ?
>>>
>>> ...
>>
>>
>

Reply via email to