On Jul 13, 2015, at 10:34 AM, Andrey Repin <anrdae...@yandex.ru> wrote:
> In my environment, a small touch to the original file cause changes throughout
> the entirety of its stored image. ('cause storage format is actually an
> archive, and a small change here and there in the source file cause massive
> shifts in the resulting image.)

Unless those files are written using either whole-archive compression or 
whole-archive encryption, rsync should still be able to find substantial 
savings in the transfer with its rolling checksums.  rsync won’t be confused by 
simple changes like a new byte added to the middle of a file, shifting all 
subsequent bytes down by one.

Some “archive” formats do use compression, but in a piecewise fashion, so that 
changing one byte of one piece of the archive may cause that entire chunk to 
change, but it might not affect any of the others.  An example of this is the 
Fossil database format.

You can figure out if your archive files work this way by adding -v to your 
rsync command.  It reports a ratio of the on-disk data size to the transfer 
size as “speedup is N”, where N > 1.0 means it is not re-sending the entire 
file.  The output of --stats gives similar info, more verbosely. 

The point I made in the original post, however, is that all this work to save 
network bandwidth comes at a disk I/O and CPU cost in the case of rsync, 
because it doesn’t have a daemon that can sit around watching for filesystem 
change events.  The larger the files are with respect to the change sizes, the 
greater the waste.

Always-running software like Dropbox avoids much of this cost because it can 
watch for those events, and thus only do work when the OS tells it that a 
particular file has changed.

I have also left out another disadvantage of rsync: it’s basically a one-way 
operation.  If you ever need two-way (or N-way) syncing, you’re better off 
moving to one of the many alternatives that know how to do this correctly.  
Multilateral syncing is surprisingly hard to get right.

I don’t mean to advertise for Dropbox, just to give it as an example that 
everyone can relate to.

An alternative that’s open source, more secure, and definitely does pay 
attention to the OS’s filesystem event API is SpiderOak.  You can see from 
their Github contents that they’ve got OS-specific file change notifiers:


Now contrast Syncthing, which has many of the same virtues, but currently 
doesn’t have file change notification built in, causing some third party to 
write a helper for Syncthing to fill the gap:


These tables may be helpful:

Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply via email to