Hi nls, On Wed, Feb 16, 2011 at 06:52:15AM -0800, nls wrote: > I am using HDF5 via h5py to store simulation data. The data are hierarchical > and I am using a nested tree of HDF5 Groups to store them. Each Group has > about 3 Datasets which are small, 3 Attributes, and a number <10 of > descendants. > > My problem is that writing is kind of slow and the files are big. They also > seem very redundant since compressing the whole file with gzip gives almost > 20x compression ratio while turning on gzip compression for the datasets has > almost no effect on file size. I also tried to set the new Group > compact/indexed storage format which reduces file size only a little. > > Am I doing something wrong in the layout of the file? The actual data > hierarchy cannot be changed, but maybe I can rearrange data differently?
Perhaps you've got a problem similar to the one I asked about here in October last year? I noticed that when creating a lot of groups with only small data sets the files got rather large compared to what I was expecting. The result of the discussion was that creating a group isn't inexpensive and requires in the order of one kilobyte. The friendly answer by Quincey Koziol can be found here: http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/2010-October/003801.html I wouldn't be too surprised if the information stored for the groups has more common patterns than data and thus is easier to compress. Of course, I don't know if this has any relevance to your problem, your description just rang some bell;-) Best regards, Jens -- \ Jens Thoms Toerring ________ [email protected] \_______________________________ http://toerring.de _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
