On Tue, 30 Mar 2010, Matija Grabnar wrote:
Er, not exactly. Read
http://www.cvsup.org/howsofast.html
I had read http://www.cvsup.org/faq.html#features item #3.
From what I can see, cvsup uses the rsync algorithm on a file-by-file basis
(it uses just the differential send part of the rsync algorithm). It doesn't
rsync the whole tree, which was what I understood to be the original problem
(wasn't the complaint about the flood of stats?).
Sounds like I may have interpreted the FAQ incorrectly, then. Thanks for
pointing that out. I have a few question, though: the explanation says:
"At the same time, the Tree Differ generates a list of the server's
files."
That seems to infer that it's doing the exact same thing as rsync, so all
the stats are still present on the server, right?
Nowhere do I see it mentioning that the daemon is maintaining state between
requests. The primary speed-ups (beyond special file update handling) is
better use of bidirectional bandwidth.
Do you have access to a cvsup server so you can verify its behavior?
So if you want to make a tool that works fine for large mirrors, your
priority apparently should be to reduce the "lots of stats" part which is
used to determine exactly what files need to be considered for checking.
(Rsync already makes sure all the *other* I/O operations are minimized).
Agreed.
Now the key, as I see it, is that unlike all the other use cases where rsync
is used, large mirrors are likely to have their directories directly
transfered from another mirror. So, the client that pulled the tree update
down could store a list of changed files, and the server could then just use
that list to determine which files
need to be synced to the downstream mirror. (Sure, the original site has to
generate the list, but if they use a tool like PAUSE to upload the files,
that shouldn't be hard to do).
Agreed, but I'm not sure we've gotten past the stat storm on the server,
though.
--Arthur Corliss
Live Free or Die