Jens,

From running h5stat on the file, it looks like 2/3 of the file space (~8 mb) is taken up by dataset chunk indexes. Since the datasets are so small, it would probably be a good idea to store them as contiguous (or even compact) instead of chunked. The rest of the space is mostly object headers (~1.5 mb for groups, ~2 mb for datasets), which the new file format (H5Pset_libver_bounds(..., H5F_LIBVER_LATEST, H5F_LIBVER_LATEST)) should help with, if you aren't already doing that. I also noticed that half of the dataset object header space is unused, so repacking the file may help there.

Thanks,
-Neil

On 02/16/2011 01:34 PM, Jens Thoms Toerring wrote:
Hi nls,

On Wed, Feb 16, 2011 at 06:52:15AM -0800, nls wrote:
I am using HDF5 via h5py to store simulation data. The data are hierarchical
and I am using a nested tree of HDF5 Groups to store them. Each Group has
about 3 Datasets which are small, 3 Attributes, and a number<10 of
descendants.

My problem is that writing is kind of slow and the files are big. They also
seem very redundant since compressing the whole file with gzip gives almost
20x compression ratio while turning on gzip compression for the datasets has
almost no effect on file size. I also tried to set the new Group
compact/indexed storage format which reduces file size only a little.

Am I doing something wrong in the layout of the file? The actual data
hierarchy cannot be changed, but maybe I can rearrange data differently?
Perhaps you've got a problem similar to the one I asked about
here in October last year? I noticed that when creating a lot
of groups with only small data sets the files got rather large
compared to what I was expecting. The result of the discussion
was that creating a group isn't inexpensive and requires in the
order of one kilobyte. The friendly answer by Quincey Koziol can
be found here:

http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/2010-October/003801.html

I wouldn't be too surprised if the information stored for the
groups has more common patterns than data and thus is easier to
compress. Of course, I don't know if this has any relevance to
your problem, your description just rang some bell;-)

                          Best regards, Jens

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to