Hmm. Never thought of this before but one of the first things that comes
to mind is whether rsync supports 'diffing' of HDF5 binary, compressed,
chunked files?

My naive understanding of tools like rsync is that they come
pre-packaged with the ability to diff ascii text files but not binary
files of any kind.

Based on behavior you describe, I am suspecting rsync cannot diff your
HDF5 files and so it is doing the only thing it can really do; copy the
whole darn, binary file?

So, next question is can you 'smarten' rsync to somehow be able to diff
HDF5 files using maybe h5diff tool? Thats as far as my thinking takes
me. Good luck.

Mark


On Mon, 2011-12-19 at 10:37 -0800, John Knutson wrote:
> Does anyone out there have any experience using rsync  to copy HDF5 
> files?  I've been trying to use rsync to make back-ups of hdf5 files as 
> they grow, but instead of the expected fairly constant time required for 
> each update, the rsync time increases as the HDF5 file grows.  This 
> suggests to me that rsync is re-transferring data instead of just 
> transferring differences.  That, or as I add data to the HDF5 file, 
> changes are being made to numerous locations in the file.
> 
> I thought maybe the problem was that the time spent doing checksums was 
> causing the increase as the files grew in size, but the rsync output 
> indicates a linear increase in actual data transferred as well, just 
> like the run time.
> 
> The files in question contain multiple data sets that are being updated, 
> each of which is stored as chunked, compressed data.
> 
> The only thing I can think of to fiddle with on the rsync end is the 
> checksum block size, and try and make it more like the size of blocks in 
> the HDF5 file, which is an unknown to me at the moment.
> 
> Alternately, I can make the files smaller, but that would not be my 
> first choice as it would be a major design change.
> 
> If anyone has any suggestions as to how to resolve this "creeping 
> transfer time" issue, I'd appreciate it.
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
-- 
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
[email protected]      urgent: [email protected]
T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to