Re: [Hdf-forum] file.h5.tar.gz is 20 times smaller. what's wrong?

Corey Bettenhausen Wed, 16 Feb 2011 10:17:19 -0800

On Wed, 16 Feb 2011 06:52:15 -0800 (PST)
 nls <[email protected]> wrote:
>
>Hi everyone, 
>
>I am using HDF5 via h5py to store simulation data. The data are hierarchical
>and I am using a nested tree of HDF5 Groups to store them. Each Group has
>about 3 Datasets which are small, 3 Attributes, and a number <10 of
>descendants.
>
>My problem is that writing is kind of slow and the files are big. They also
>seem very redundant since compressing the whole file with gzip gives almost
>20x compression ratio while turning on gzip compression for the datasets has
>almost no effect on file size. I also tried to set the new Group
>compact/indexed storage format which reduces file size only a little.
>
>Am I doing something wrong in the layout of the file? The actual data
>hierarchy cannot be changed, but maybe I can rearrange data differently?
>
>Here is a link to an example file if anyone would like to have a look: 
>http://dl.dropbox.com/u/5077634/br_0.h5.tar.gz  (760k compressed, 3500
>Groups, 7000 Datasets)
>
>thx for any hints!


You can check to see if the datasets are compressed using the h5dump command 
with the -p option:
h5dump -p -H filename.h5

Compressed datasets will have something like the following listed:
FILTERS {
         COMPRESSION DEFLATE { LEVEL 5 }
      }

Since you're not seeing a size decrease, you may want to double-check that the 
datasets are actually being compressed.  I've made this mistake in the past.
Cheers,
-Corey

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] file.h5.tar.gz is 20 times smaller. what's wrong?

Reply via email to