Hi, in our organization the data that we need to store in HDF5 typically varies in size very much, from small objects 10-20 bytes in size to very large several-MB per object (typically images). The chunks that we create for large objects tend to be large as well and they exceed the standard HDF5 setting for chunk cache size (1MB). That of course means that with the default settings large chunks are never cached in memory.
We cannot reduce our chunk size as it will lead to way too many chunks which causes other sorts of problems. Standard solution is of course is to set chunk cache size when reading data to a larger value. This does not work too well for us because we have a multitude of tools for HDF5 access - C++, Matlab, h5py, IDL, etc.; and too many users that need some education about how to change cache size settings in each of those tools (which is not always trivial). The only reasonable solution that I found for now is to patch HDF5 sources to increase default cache size value from 1MB to 32MB. That has is own troubles because not everyone uses our patched HDF5 library of course. I think it would be beneficial in cases like ours to have an adaptive algorithm in HDF5 by default which can fit larger chunks in cache. Would it be possible to add something like this to future HDF5 versions? I don't think it has to be complex, simplest thing would probably be "make sure that at least one chunk fits in cache unless user provides explicit cache size for a dataset". If help is needed I could try to produce a patch which does that (will need some time to understand the code of course). Thanks, Andy _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
