On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen <svein...@gmail.com> wrote: > > On 2. juli 2012, at 22.40, Nathaniel Smith wrote: > >> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen <svein...@gmail.com> >> wrote: >>> [snip] >>> >>> >>> >>> Your actual memory usage may not have increased as much as you think, >>> since memmap objects don't necessarily take much memory -- it sounds >>> like you're leaking virtual memory, but your resident set size >>> shouldn't go up as much. >>> >>> >>> As I understand it, memmap objects retain the contents of the memmap in >>> memory after it has been read the first time (in a lazy manner). Thus, when >>> reading a slice of a 24GB file, only that part recides in memory. Our system >>> reads a slice of a memmap, calculates something (say, the sum), and then >>> deletes the memmap. It then loops through this for consequitive slices, >>> retaining a low memory usage. Consider the following code: >>> >>> import numpy as np >>> res = [] >>> vecLen = 3095677412 >>> for i in xrange(vecLen/10**8+1): >>> x = i * 10**8 >>> y = min((i+1) * 10**8, vecLen) >>> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum()) >>> >>> The memory usage of this code on a 24GB file (one value for each nucleotide >>> in the human DNA!) is 23g resident memory after the loop is finished (not >>> 24g for some reason..). >>> >>> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the >>> loop. >> >> Your memory measurement tools are misleading you. The same memory is >> resident in both cases, just in one case your tools say it is >> operating system disk cache (and not attributed to your app), and in >> the other case that same memory, treated in the same way by the OS, is >> shown as part of your app's resident memory. Virtual memory is >> confusing... > > But the crucial difference is perhaps that the disk cache can be cleared by > the OS if needed, but not the application memory in the same way, which must > be swapped to disk? Or am I still confused? > > (snip) > >>> >>> Great! Any idea on whether such a patch may be included in 1.7? >> >> Not really, if I or you or someone else gets inspired to take the time >> to write a patch soon then it will be, otherwise not... >> >> -N > > I have now tried to add a patch, in the way you proposed, but I may have > gotten it wrong.. > > http://projects.scipy.org/numpy/ticket/2179
I put this in a github repo, and added tests (author credit to Sveinung) https://github.com/thouis/numpy/tree/mmap_children I'm not sure which branch to issue a PR request against, though. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion