On Tue, Dec 7, 2010 at 6:28 AM, Francesc Alted <[email protected]> wrote:

> In addition, memory mapping also has drawbacks, being
> the most important one the inability to map files that are larger than
> your available virtual memory, which renders this technology inadequate
> for many uses.
>

I guess I knew about the VM limitation but it hadn't completely sunk in.

So in fact mmap's limit is the same as RAM:
RAM -> big initial load -> VM limited
mmap -> zero initial load -> VM limited
disk -> zero initial load -> disk limited

That is interesting.  For us VM limit will probably last us a long time, if
we crank up the swap size on our machines.  But maybe not forever, probably
a fully disk-based solution is the most future proof.

What many bindings for high-level languages are doing (specially in
> Python) is to treat datasets on-disk (available at low-level via HDF or
> NetCDF libraries) like if they were datasets in-memory...


We looked at PyTables but at the time since we were loading everything into
RAM it seemed like overkill. Now maybe we should consider it again, but
we've already spent a lot of time writing a C++ library that we call from
C++ or from Python.

We have a lot of options:
1) HDF5 format -> PyTables
2) HDF5 format -> custom paging in our C++ library
3) custom binary format -> mmap
4) SQL database
5) key/value store (e.g. redis)

mmap strikes my as the least-change, and likely to perform quite well.  But
it would be a shame to use a custom binary format.

Thanks for all the input I won't reply to every message, but lots of good
ideas here, we appreciate it.

-Philip
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to