Re: Trimming the CPAN - "Automatic Purging"

Matija Grabnar Wed, 31 Mar 2010 03:47:50 -0700

Arthur Corliss wrote:

You had me excited at first, but then the home page said:
To update non-RCS files, CVSup uses the highly efficient rsyncalgorithm,
  developed by Andrew Tridgell and Paul Mackerras.

Looks like its speed benefits are due to knowledge of specific file types
(RCS and log files) so it can grab just the new content for transfer.For
all other types it falls back onto rsync, which they say is built into
CVSup.

Er, not exactly. Read
http://www.cvsup.org/howsofast.html

From what I can see, cvsup uses the rsync algorithm on a file-by-filebasis (it uses just the differential send part of the rsync algorithm).It doesn't rsync the whole tree, which was what I understood to be theoriginal problem (wasn't the complaint about the flood of stats?).

So if you want to make a tool that works fine for large mirrors, yourpriority apparently should be to reduce the "lots of stats" part whichis used to determine exactly what files need to be considered forchecking. (Rsync already makes sure all the *other* I/O operations areminimized).

Now the key, as I see it, is that unlike all the other use cases wherersync is used, large mirrors are likely to have their directoriesdirectly transfered from another mirror. So, the client that pulled thetree update down could store a list of changed files, and the servercould then just use that list to determine which filesneed to be synced to the downstream mirror. (Sure, the original site hasto generate the list, but if they use a tool like PAUSE to upload thefiles, that shouldn't be hard to do).

Re: Trimming the CPAN - "Automatic Purging"

Reply via email to