A Tuesday 07 December 2010 04:13:10 Philip Winston escrigué: > Yes I think hyperslabs would be an essential tool for us. > > Our hyperslabs would essentially be just single rows. That worries > me a little how it would perform but I should just try it and see. > We need to be able to extend the dataset so we need chunking just > for that, if nothing else. > > As per my other email I am worried maybe reading/writing single rows > is not a good fit for HDF5? But again I should really just > experiment and see. Thanks.
My experience on this regard is that, when you want speed, there is little that can compete with mmap in terms of performance. However, many times you may want to sacrifice extreme performance for more functionality. In addition, memory mapping also has drawbacks, being the most important one the inability to map files that are larger than your available virtual memory, which renders this technology inadequate for many uses. What many bindings for high-level languages are doing (specially in Python) is to treat datasets on-disk (available at low-level via HDF or NetCDF libraries) like if they were datasets in-memory. That way, you are effectively dealing with on-disk data as if it was in-memory (this is what you are after, IIUC). The OS filesystem cache is then in charge of caching as much data as possible in memory, so you get a behaviour that is very close in performance to a memory map approach. Of course, you still have the HDF/NetCDF/whatever layer, which introduces some overhead, but this is largely compensated by other nice features, like on-the-flight compression (that may effectively accelerate I/O to disk) or practically unlimited dataset capacity (i.e. exceeding the virtual memory boundaries), among many others. Regarding the comparison of binary formats (like HDF5) in comparison with relational databases, it is frequent the case that the former performs better than the later, specially if the interface is optimized. Hope this helps, -- Francesc Alted _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
