Hi nls,

On Wed, Feb 16, 2011 at 06:52:15AM -0800, nls wrote:
> I am using HDF5 via h5py to store simulation data. The data are hierarchical
> and I am using a nested tree of HDF5 Groups to store them. Each Group has
> about 3 Datasets which are small, 3 Attributes, and a number <10 of
> descendants.
> 
> My problem is that writing is kind of slow and the files are big. They also
> seem very redundant since compressing the whole file with gzip gives almost
> 20x compression ratio while turning on gzip compression for the datasets has
> almost no effect on file size. I also tried to set the new Group
> compact/indexed storage format which reduces file size only a little.
> 
> Am I doing something wrong in the layout of the file? The actual data
> hierarchy cannot be changed, but maybe I can rearrange data differently?

Perhaps you've got a problem similar to the one I asked about
here in October last year? I noticed that when creating a lot
of groups with only small data sets the files got rather large
compared to what I was expecting. The result of the discussion
was that creating a group isn't inexpensive and requires in the
order of one kilobyte. The friendly answer by Quincey Koziol can
be found here:

http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/2010-October/003801.html
 
I wouldn't be too surprised if the information stored for the
groups has more common patterns than data and thus is easier to
compress. Of course, I don't know if this has any relevance to
your problem, your description just rang some bell;-)

                         Best regards, Jens
-- 
  \   Jens Thoms Toerring  ________      [email protected]
   \_______________________________      http://toerring.de

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to