[Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Stephen Simmons
Hi, Is anyone working on alternative storage options for numpy arrays, and specifically recarrays? My main application involves processing series of large recarrays (say 1000 recarrays, each with 5M rows having 50 fields). Existing options meet some but not all of my requirements.

Re: [Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Dag Sverre Seljebotn
Dag Sverre Seljebotn: Hi, Is anyone working on alternative storage options for numpy arrays, and specifically recarrays? My main application involves processing series of large recarrays (say 1000 recarrays, each with 5M rows having 50 fields). Existing options meet some but not all of my

Re: [Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Dag Sverre Seljebotn
Stephen Simmons wrote: P.S. Maybe this will be too much work, and I'd be better off sticking with Pytables. I can't judge that, but I want to share some thoughts (rant?): - Are you ready to not only write the code, but maintain it over years to come, and work through nasty bugs, and think

Re: [Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Zachary Pincus
Unless I read your request or the documentation wrong, h5py already supports pulling specific fields out of compound data types: http://h5py.alfven.org/docs-1.1/guide/hl.html#id3 For compound data, you can specify multiple field names alongside the numeric slices: dset[FieldA]

Re: [Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Francesc Alted
A Friday 30 October 2009 14:18:05 Stephen Simmons escrigué: - Pytables (HDF using chunked storage for recarrays with LZO compression and shuffle filter) - can't extract individual field from a recarray Er... Have you tried the ``cols`` accessor?

Re: [Numpy-discussion] Designing a new storage format for numpy recarrays

2009-10-30 Thread Anne Archibald
2009/10/30 Stephen Simmons m...@stevesimmons.com: I should clarify what I meant.. Suppose I have a recarray with 50 fields and want to read just one of those fields. PyTables/HDF will read in the compressed data for chunks of complete rows, decompress the full 50 fields, and then give me