Arthur Corliss wrote:
You had me excited at first, but then the home page said:

To update non-RCS files, CVSup uses the highly efficient rsync algorithm,
  developed by Andrew Tridgell and Paul Mackerras.

Looks like its speed benefits are due to knowledge of specific file types
(RCS and log files) so it can grab just the new content for transfer. For
all other types it falls back onto rsync, which they say is built into
CVSup.
Er, not exactly. Read
http://www.cvsup.org/howsofast.html

From what I can see, cvsup uses the rsync algorithm on a file-by-file basis (it uses just the differential send part of the rsync algorithm). It doesn't rsync the whole tree, which was what I understood to be the original problem (wasn't the complaint about the flood of stats?).

So if you want to make a tool that works fine for large mirrors, your priority apparently should be to reduce the "lots of stats" part which is used to determine exactly what files need to be considered for checking. (Rsync already makes sure all the *other* I/O operations are minimized).

Now the key, as I see it, is that unlike all the other use cases where rsync is used, large mirrors are likely to have their directories directly transfered from another mirror. So, the client that pulled the tree update down could store a list of changed files, and the server could then just use that list to determine which files need to be synced to the downstream mirror. (Sure, the original site has to generate the list, but if they use a tool like PAUSE to upload the files, that shouldn't be hard to do).

Reply via email to