(Re-raising an issue that was brought up last year: [1])

Since the Enthought webinar on memmap-ing numpy arrays[2] suggested
PyTables for creating new files (see slide 30 at [3]), I assumed by
association that PyTables mem-mapped the data also. I switched an
algorithm that kept data in memory over to use PyTables, and sure
enough memory usage dropped dramatically, but now coming back to it, I
find that performance took a big hit. Upon closer investigation, no,
PyTables doesn't mmap. Oops.

(Use case: we have a read-only matrix that's an array of vectors.
Given a probe vector, we want to find the top n vectors closest to it,
measured by dot product. numpy's dot function does exactly what we
want. But this runs in a multiprocess server, and these matrices are
largeish, so I thought memmap would be a good way to let the OS handle
sharing the matrix between the processes.)

(Array _columns_ are stored contiguously, right?)

Since PyTables doesn't currently do what I thought it did, we'll
probably move to using memmapped ndarrays directly, as the webinar
describes. But the natural question is, could PyTables possibly do
what I thought it could? It might be very hard to handle compressed
data, but uncompressed data seems possible; if the data is contiguous
in the HDF5 file, all we really need is a way to get that data in
memory, or at least its offset into the file. Poking around the HDF5
api[4], I don't see an obvious way to do that, but I do wonder if
anyone has given it any thought.

Thanks,
-Ken

[1] 
http://sourceforge.net/mailarchive/message.php?msg_id=200809271036.50004.faltet%40pytables.com
[2] http://www.enthought.com/training/SCPwebinar.php#w2009-05-22
[3] 
http://www.slideshare.net/enthought/python-for-scientific-computing-webinar-may-22-2009
[4] http://www.hdfgroup.org/HDF5/doc/RM/RM_H5Front.html

------------------------------------------------------------------------------
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to