On Sat, Oct 23, 2010 at 10:27 AM, braingateway <[email protected]>wrote:
> Charles R Harris : > > > > > > On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > > > On Sat, Oct 23, 2010 at 9:44 AM, braingateway > > <[email protected] <mailto:[email protected]>> wrote: > > > > David Cournapeau : > > > > 2010/10/23 braingateway <[email protected] > > <mailto:[email protected]>>: > > > > > > Hi everyone, > > I noticed the numpy.memmap using RAM to buffer data > > from memmap files. > > If I get a 100GB array in a memmap file and process it > > block by block, > > the RAM usage is going to increasing with the process > > running until > > there is no available space in RAM (4GB), even though > > the block size is > > only 1MB. > > for example: > > #### > > a = numpy.memmap(‘a.bin’, dtype='float64', mode='r') > > blocklen=1e5 > > b=npy.zeros((len(a)/blocklen,)) > > for i in range(0,len(a)/blocklen): > > b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen]) > > #### > > Is there any way to restrict the memory usage in > > numpy.memmap? > > > > > > > > The whole point of using memmap is to let the OS do the > > buffering for > > you (which is likely to do a better job than you in many > > cases). Which > > OS are you using ? And how do you measure how much memory > > is taken by > > numpy for your array ? > > > > David > > _______________________________________________ > > > > > > Hi David, > > > > I agree with you about the point of using memmap. That is why > > the behavior is so strange to me. > > I actually measure the size of resident set (pink trace in > > figure2) of the python process on Windows. Here I attached the > > result. You can see the RAM usage is definitely not file > > system cache. > > > > > > Umm, a good operating system will use *all* of ram for buffering > > because ram is fast and it assumes you are likely to reuse data > > you have already used once. If it needs some memory for something > > else it just writes a page to disk, if dirty, and reads in the new > > data from disk and changes the address of the page. Where you get > > into trouble is if pages can't be evicted for some reason. Most > > modern OS's also have special options available for reading in > > streaming data from disk that can lead to significantly faster > > access for that sort of thing, but I don't think you can do that > > with memmapped files. > > > > I'm not sure how windows labels it's memory. IIRC, Memmaping a > > file leads to what is called file backed memory, it is essentially > > virtual memory. Now, I won't bet my life that there isn't a > > problem, but I think a misunderstanding of the memory information > > is more likely. > > > > > > It is also possible that something else in your program is hanging > > onto memory but without knowing a lot more it is hard to tell. Are you > > seeing symptoms besides the memory graphs? It looks like you aren't > > running on windows, actually, so what OS are you running on? > > > > Chuck > > ------------------------------------------------------------------------ > > > > > Hi Chuck, > > Thanks a lot for quick response. I do run following supper simple script > on windows: > > #### > a = numpy.memmap(‘a.bin’, dtype='float64', mode='r') > blocklen=1e5 > b=npy.zeros((len(a)/blocklen,)) > for i in range(0,len(a)/blocklen): > b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen]) > #### > Everything became supper slow after python ate all the RAM. > By the way, I also tried Qt QFile::map() there is no problem at all... > > Hmm. Nothing looks suspicious. For reference, can you be specific about the OS/version, python version, and numpy version? What happens if you simply do for i in range(0,len(a)/blocklen): a[i*blocklen:(i+1)*blocklen].copy() Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
