Hmm. Never thought of this before but one of the first things that comes to mind is whether rsync supports 'diffing' of HDF5 binary, compressed, chunked files?
My naive understanding of tools like rsync is that they come pre-packaged with the ability to diff ascii text files but not binary files of any kind. Based on behavior you describe, I am suspecting rsync cannot diff your HDF5 files and so it is doing the only thing it can really do; copy the whole darn, binary file? So, next question is can you 'smarten' rsync to somehow be able to diff HDF5 files using maybe h5diff tool? Thats as far as my thinking takes me. Good luck. Mark On Mon, 2011-12-19 at 10:37 -0800, John Knutson wrote: > Does anyone out there have any experience using rsync to copy HDF5 > files? I've been trying to use rsync to make back-ups of hdf5 files as > they grow, but instead of the expected fairly constant time required for > each update, the rsync time increases as the HDF5 file grows. This > suggests to me that rsync is re-transferring data instead of just > transferring differences. That, or as I add data to the HDF5 file, > changes are being made to numerous locations in the file. > > I thought maybe the problem was that the time spent doing checksums was > causing the increase as the files grew in size, but the rsync output > indicates a linear increase in actual data transferred as well, just > like the run time. > > The files in question contain multiple data sets that are being updated, > each of which is stored as chunked, compressed data. > > The only thing I can think of to fiddle with on the rsync end is the > checksum block size, and try and make it more like the size of blocks in > the HDF5 file, which is an unknown to me at the moment. > > Alternately, I can make the files smaller, but that would not be my > first choice as it would be a major design change. > > If anyone has any suggestions as to how to resolve this "creeping > transfer time" issue, I'd appreciate it. > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org -- Mark C. Miller, Lawrence Livermore National Laboratory ================!!LLNL BUSINESS ONLY!!================ [email protected] urgent: [email protected] T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
