On Nov 30, 2007 2:47 AM, Martin Spacek <[EMAIL PROTECTED]> wrote: > I need to load a 1.3GB binary file entirely into a single numpy.uint8 > array. I've been using numpy.fromfile(), but for files > 1.2GB on my > win32 machine, I get a memory error. Actually, since I have several > other python modules imported at the same time, including pygame, I get > a "pygame parachute" and a segfault that dumps me out of python: > > data = numpy.fromfile(f, numpy.uint8) # where f is the open file > > 1382400000 items requested but only 0 read > Fatal Python error: (pygame parachute) Segmentation Fault
You might try numpy.memmap -- others have had success with it for large files (32 bit should be able to handle a 1.3 GB file, AFAIK). See for example: http://www.thescripts.com/forum/thread654599.html Kurt > > If I stick to just doing it at the interpreter with only numpy imported, > I can open up files that are roughly 100MB bigger, but any more than > that and I get a clean MemoryError. This machine has 2GB of RAM. I've > tried setting the /3GB switch on winxp bootup, as well as all the > registry suggestions at > http://www.msfn.org/board/storage-process-command-t62001.html. No luck. > I get the same error in (32bit) ubuntu for a sufficiently big file. > > I find that if I load the file in two pieces into two arrays, say 1GB > and 0.3GB respectively, I can avoid the memory error. So it seems that > it's not that windows can't allocate the memory, just that it can't > allocate enough contiguous memory. I'm OK with this, but for indexing > convenience, I'd like to be able to treat the two arrays as if they were > one. Specifically, this file is movie data, and the array I'd like to > get out of this is of shape (nframes, height, width). Right now I'm > getting two arrays that are something like (0.8*nframes, height, width) > and (0.2*nframes, height, width). Later in the code, I only need to > index over the 0th dimension, i.e. the frame index. > > I'd like to access all the data using a single range of frame indices. > Is there any way to combine these two arrays into what looks like a > single array, without having to do any copying within memory? I've tried > using numpy.concatenate(), but that gives me a MemoryError because, I > presume, it's doing a copy. Would it be better to load the file one > frame at a time, generating nframes arrays of shape (height, width), and > sticking them consecutively in a python list? > > I'm using numpy 1.0.4 (compiled from source tarball with Intel's MKL > library) on python 2.5.1 in winxp. > > Thanks for any advice, > > Martin > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion