I see that pytables deals with hdf5 data. It would be very nice if the data were in such a standard format, but that is not the case, and that can't be changed.
________________________________________ Da: numpy-discussion-boun...@scipy.org [numpy-discussion-boun...@scipy.org] per conto di Frédéric Bastien [no...@nouiz.org] Inviato: mercoledì 13 marzo 2013 15.03 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks Hi, I would suggest that you look at pytables[1]. It use a different file format, but it seam to do exactly what you want and give an object that have a very similar interface to numpy.ndarray (but fewer function). You would just ask for the slice/indices that you want and it return you a numpy.ndarray. HTH Frédéric [1] http://www.pytables.org/moin On Wed, Mar 13, 2013 at 9:54 AM, Nathaniel Smith <n...@pobox.com> wrote: > On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus > <andrea.cimatori...@nioz.nl> wrote: >> Hi everybody, I hope this has not been discussed before, I couldn't find a >> solution elsewhere. >> I need to read some binary data, and I am using numpy.fromfile to do this. >> Since the files are huge, and would make me run out of memory, I need to >> read data skipping some records (I am reading data recorded at high >> frequency, so basically I want to read subsampling). >> At the moment, I came up with the code below, which is then compiled using >> cython. Despite the significant performance increase from the pure python >> version, the function is still much slower than numpy.fromfile, and only >> reads one kind of data (in this case uint32), otherwise I do not know how to >> define the array type in advance. I have basically no experience with cython >> nor c, so I am a bit stuck. How can I try to make this more efficient and >> possibly more generic? > > If your data is stored as fixed-format binary (as it seems it is), > then the easiest way is probably > > # Exploit the operating system's virtual memory manager to get a > "virtual copy" of the entire file in memory > # (This does not actually use any memory until accessed): > virtual_arr = np.memmap(path, np.uint32, "r") > # Get a numpy view onto every 20th entry: > virtual_arr_subsampled = virtual_arr[::20] > # Copy those bits into regular malloc'ed memory: > arr_subsampled = virtual_arr_subsampled.copy() > > (Your data is probably large enough that this will only work if you're > using a 64-bit system, because of address space limitations; but if > you have data that's too large to fit into memory, then I assume > you're using a 64-bit system anyway...) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion