Re: Trimming the CPAN - "Automatic Purging"

Arthur Corliss Mon, 29 Mar 2010 00:50:41 -0700

On Sun, 28 Mar 2010, Nicholas Clark wrote:

Are you running a large public mirror site, where you don't even have
knowledge of who is mirroring from you?


(Not even knowledge, let alone channels of communication with, let alone
control over)

Because (as I see it, not having done any of this) the logistics of that is
going to have as much bearing on trying to change protocols as the actual
technical merits of the protocol itself.


I do run mirrors and am mirrored from.  Not on the scale of CPAN (in terms
of file count), but having been long aware of the effect of rsync servers I
have explored the scalability aspects of it.

It should have been obvious that trying to facilitate a cut-over to a new
syncing tool can't be done on this scale in one fell swoop.  Obviously,
there'd have to be a gradual migration where protocols are supported
concurrently, much like FTP & rsync are currently both supported.  We add a
new option and encourage people to move over.  Since we already have a list
of the public mirrors we should have some idea of where to start that
conversation.

Most of the cost of rsync is an externality to the clients. If one has an
existing mirror, one is using rsync to keep it up to date, what's the
incentive to change?


Common sense and professional courtesy.  Especially because it's likely that
some "clients" running public mirrors may be a sync source for some private
mirrors.  They may not feel the pain of the master repositories, but they

certainly share a portion. And it's not likely that many mirrors have acapital budget to support scaling a free service, so it would be best tomake efficient use of those resources.

I'm missing something here, I suspect. How can HTTP be more efficient than
rsync? The only obvious method to me of mirroring a CPAN site by HTTP is to
instruct a client (such as wget) to get it all. In which case, in the course
of doing this the client is going to recurse over the entire directory tree
of the server, which, I thought, was functionally equivalent to the behaviour
of the rsync server.


You are missing something, but I may have not been explicit enough.  HTTP or
FTP can easily be the payload transport, once you know the precise files
that need to be transferred.  That is tremendously more efficient than what
rsync does on the server.  So, use rsync (or FTP mgets, etc.) to transfer
your transaction logs, compile a list of new files to retrieve, and use the
very common and low-overhead protocols to transfer the files...

        --Arthur Corliss
          Live Free or Die

Re: Trimming the CPAN - "Automatic Purging"

Reply via email to