Charles R Harris


On Sat, Oct 23, 2010 at 10:27 AM, braingateway <[email protected] <mailto:[email protected]>> wrote:

    Charles R Harris :
    >
    >
    > On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris
    > <[email protected] <mailto:[email protected]>
    <mailto:[email protected]
    <mailto:[email protected]>>> wrote:
    >
    >
    >
    >     On Sat, Oct 23, 2010 at 9:44 AM, braingateway
    >     <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>
    wrote:
    >
    >         David Cournapeau :
    >
    >             2010/10/23 braingateway <[email protected]
    <mailto:[email protected]>
    >             <mailto:[email protected]
    <mailto:[email protected]>>>:
    >
    >
    >                 Hi everyone,
    >                 I noticed the numpy.memmap using RAM to buffer data
    >                 from memmap files.
    >                 If I get a 100GB array in a memmap file and
    process it
    >                 block by block,
    >                 the RAM usage is going to increasing with the
    process
    >                 running until
    >                 there is no available space in RAM (4GB), even
    though
    >                 the block size is
    >                 only 1MB.
    >                 for example:
    >                 ####
    >                 a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
    >                 blocklen=1e5
    >                 b=npy.zeros((len(a)/blocklen,))
    >                 for i in range(0,len(a)/blocklen):
    >                 b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
    >                 ####
    >                 Is there any way to restrict the memory usage in
    >                 numpy.memmap?
    >
    >
    >
    >             The whole point of using memmap is to let the OS do the
    >             buffering for
    >             you (which is likely to do a better job than you in many
    >             cases). Which
    >             OS are you using ? And how do you measure how much
    memory
    >             is taken by
    >             numpy for your array ?
    >
    >             David
    >             _______________________________________________
    >
    >
    >         Hi David,
    >
    >         I agree with you about the point of using memmap. That
    is why
    >         the behavior is so strange to me.
    >         I actually measure the size of resident set (pink trace in
    >         figure2) of the python process on Windows. Here I
    attached the
    >          result. You can see the  RAM  usage is definitely not file
    >         system cache.
    >
    >
    >     Umm, a good operating system will use *all* of ram for buffering
    >     because ram is fast and it assumes you are likely to reuse data
    >     you have already used once. If it needs some memory for
    something
    >     else it just writes a page to disk, if dirty, and reads in
    the new
    >     data from disk and changes the address of the page. Where
    you get
    >     into trouble is if pages can't be evicted for some reason. Most
    >     modern OS's also have special options available for reading in
    >     streaming data from disk that can lead to significantly faster
    >     access for that sort of thing, but I don't think you can do that
    >     with memmapped files.
    >
    >     I'm not sure how windows labels it's memory. IIRC, Memmaping a
    >     file leads to what is called file backed memory, it is
    essentially
    >     virtual memory. Now, I won't bet my life that there isn't a
    >     problem, but I think a misunderstanding of the memory
    information
    >     is more likely.
    >
    >
    > It is also possible that something else in your program is hanging
    > onto memory but without knowing a lot more it is hard to tell.
    Are you
    > seeing symptoms besides the memory graphs? It looks like you aren't
    > running on windows, actually, so what OS are you running on?
    >
    > Chuck
    >
    ------------------------------------------------------------------------
    >
    >
    Hi Chuck,

    Thanks a lot for quick response. I do run following supper simple
    script
    on windows:

    ####
    a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
    blocklen=1e5
    b=npy.zeros((len(a)/blocklen,))
    for i in range(0,len(a)/blocklen):
    b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
    ####
    Everything became supper slow after python ate all the RAM.
    By the way, I also tried Qt QFile::map() there is no problem at all...


Hmm. Nothing looks suspicious. For reference, can you be specific about the OS/version, python version, and numpy version?

What happens if you simply do
for i in range(0,len(a)/blocklen):
     a[i*blocklen:(i+1)*blocklen].copy()

Chuck

Hi Chuck,
Here is the versions:
print sys.version
2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit (AMD64)]
print numpy.__version__
1.4.1
print sys.getwindowsversion()
(5, 2, 3790, 2, 'Service Pack 2')

Besides, a[i*blocklen:(i+1)*blocklen].copy() gave out the same result.

LittleBigBrain

<<inline: numpyMemmapAvaRAM3.png>>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to