On Tue, 30 Mar 2010, Matija Grabnar wrote:

Er, not exactly. Read
http://www.cvsup.org/howsofast.html

I had read  http://www.cvsup.org/faq.html#features  item #3.

From what I can see, cvsup uses the rsync algorithm on a file-by-file basis (it uses just the differential send part of the rsync algorithm). It doesn't rsync the whole tree, which was what I understood to be the original problem (wasn't the complaint about the flood of stats?).

Sounds like I may have interpreted the FAQ incorrectly, then.  Thanks for
pointing that out.  I have a few question, though: the explanation says:

   "At the same time, the Tree Differ generates a list of the server's
   files."

That seems to infer that it's doing the exact same thing as rsync, so all the stats are still present on the server, right?

Nowhere do I see it mentioning that the daemon is maintaining state between
requests.  The primary speed-ups (beyond special file update handling) is
better use of bidirectional bandwidth.

Do you have access to a cvsup server so you can verify its behavior?

So if you want to make a tool that works fine for large mirrors, your priority apparently should be to reduce the "lots of stats" part which is used to determine exactly what files need to be considered for checking. (Rsync already makes sure all the *other* I/O operations are minimized).

Agreed.

Now the key, as I see it, is that unlike all the other use cases where rsync is used, large mirrors are likely to have their directories directly transfered from another mirror. So, the client that pulled the tree update down could store a list of changed files, and the server could then just use that list to determine which files need to be synced to the downstream mirror. (Sure, the original site has to generate the list, but if they use a tool like PAUSE to upload the files, that shouldn't be hard to do).

Agreed, but I'm not sure we've gotten past the stat storm on the server,
though.

        --Arthur Corliss
          Live Free or Die

Reply via email to