On Tue, Dec 7, 2010 at 6:28 AM, Francesc Alted <[email protected]> wrote:
> In addition, memory mapping also has drawbacks, being > the most important one the inability to map files that are larger than > your available virtual memory, which renders this technology inadequate > for many uses. > I guess I knew about the VM limitation but it hadn't completely sunk in. So in fact mmap's limit is the same as RAM: RAM -> big initial load -> VM limited mmap -> zero initial load -> VM limited disk -> zero initial load -> disk limited That is interesting. For us VM limit will probably last us a long time, if we crank up the swap size on our machines. But maybe not forever, probably a fully disk-based solution is the most future proof. What many bindings for high-level languages are doing (specially in > Python) is to treat datasets on-disk (available at low-level via HDF or > NetCDF libraries) like if they were datasets in-memory... We looked at PyTables but at the time since we were loading everything into RAM it seemed like overkill. Now maybe we should consider it again, but we've already spent a lot of time writing a C++ library that we call from C++ or from Python. We have a lot of options: 1) HDF5 format -> PyTables 2) HDF5 format -> custom paging in our C++ library 3) custom binary format -> mmap 4) SQL database 5) key/value store (e.g. redis) mmap strikes my as the least-change, and likely to perform quite well. But it would be a shame to use a custom binary format. Thanks for all the input I won't reply to every message, but lots of good ideas here, we appreciate it. -Philip
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
