On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen <svein...@gmail.com> wrote:
>
> On 2. juli 2012, at 22.40, Nathaniel Smith wrote:
>
>> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen <svein...@gmail.com> 
>> wrote:
>>> [snip]
>>>
>>>
>>>
>>> Your actual memory usage may not have increased as much as you think,
>>> since memmap objects don't necessarily take much memory -- it sounds
>>> like you're leaking virtual memory, but your resident set size
>>> shouldn't go up as much.
>>>
>>>
>>> As I understand it, memmap objects retain the contents of the memmap in
>>> memory after it has been read the first time (in a lazy manner). Thus, when
>>> reading a slice of a 24GB file, only that part recides in memory. Our system
>>> reads a slice of a memmap, calculates something (say, the sum), and then
>>> deletes the memmap. It then loops through this for consequitive slices,
>>> retaining a low memory usage. Consider the following code:
>>>
>>> import numpy as np
>>> res = []
>>> vecLen = 3095677412
>>> for i in xrange(vecLen/10**8+1):
>>> x = i * 10**8
>>> y = min((i+1) * 10**8, vecLen)
>>> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())
>>>
>>> The memory usage of this code on a 24GB file (one value for each nucleotide
>>> in the human DNA!) is 23g resident memory after the loop is finished (not
>>> 24g for some reason..).
>>>
>>> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
>>> loop.
>>
>> Your memory measurement tools are misleading you. The same memory is
>> resident in both cases, just in one case your tools say it is
>> operating system disk cache (and not attributed to your app), and in
>> the other case that same memory, treated in the same way by the OS, is
>> shown as part of your app's resident memory. Virtual memory is
>> confusing...
>
> But the crucial difference is perhaps that the disk cache can be cleared by 
> the OS if needed, but not the application memory in the same way, which must 
> be swapped to disk? Or am I still confused?
>
> (snip)
>
>>>
>>> Great! Any idea on whether such a patch may be included in 1.7?
>>
>> Not really, if I or you or someone else gets inspired to take the time
>> to write a patch soon then it will be, otherwise not...
>>
>> -N
>
> I have now tried to add a patch, in the way you proposed, but I may have 
> gotten it wrong..
>
> http://projects.scipy.org/numpy/ticket/2179

I put this in a github repo, and added tests (author credit to Sveinung)
https://github.com/thouis/numpy/tree/mmap_children

I'm not sure which branch to issue a PR request against, though.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to