Paul Anton Letnes wrote:
Compression might screw with that, though. An idea is to use rsync
compression instead, and leave the hdf5 files uncompressed. From man
rsync:
-z, --compress compress file data during the transfer
Cheers
Paul
I'm a bit reluctant to turn off compression due to the storage
requirements of the data in question (it's an enormous amount of data
that happens to compress very well). Still, aren't the chunks
compressed individually? I was under the impression that when
compressing, each chunk was individually compressed. As such, the only
things that should be changing are those chunks that have had new data
added, and the table of contents (I forget the term the devs use). How
much is actually changed probably depends a lot on the pre-allocation of
data.
Now, I'm using the gzip *filter*, in conjunction with the shuffle
filter. There's no indication of an "rsyncable" option here, or in the
(admittedly dated) gzip binaries I have installed. Using gzip outside
of HDF5 would require an awful lot of reengineering and even so would
significantly limit the accessibility of the data.
One thing that I might do to quantify this is save a copy of one of the
files between rsyncs and do a binary diff afterward to see what's really
changing. Unfortunately I don't have the ins and outs of the HDF5 file
format stored in my brain so interpretation of the results of such a
test will be time consuming.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org