Paul Anton Letnes wrote:
Compression might screw with that, though. An idea is to use rsync compression instead, and leave the hdf5 files uncompressed. From man rsync:
  -z, --compress              compress file data during the transfer

Cheers
Paul
I'm a bit reluctant to turn off compression due to the storage requirements of the data in question (it's an enormous amount of data that happens to compress very well). Still, aren't the chunks compressed individually? I was under the impression that when compressing, each chunk was individually compressed. As such, the only things that should be changing are those chunks that have had new data added, and the table of contents (I forget the term the devs use). How much is actually changed probably depends a lot on the pre-allocation of data.

Now, I'm using the gzip *filter*, in conjunction with the shuffle filter. There's no indication of an "rsyncable" option here, or in the (admittedly dated) gzip binaries I have installed. Using gzip outside of HDF5 would require an awful lot of reengineering and even so would significantly limit the accessibility of the data.

One thing that I might do to quantify this is save a copy of one of the files between rsyncs and do a binary diff afterward to see what's really changing. Unfortunately I don't have the ins and outs of the HDF5 file format stored in my brain so interpretation of the results of such a test will be time consuming.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to