On Sat, 27 Mar 2010, Elaine Ashton wrote:
Actually, I thought I was merely offering my opinion both as the sysadmin for
the canonical CPAN mothership and as an end-user. If that makes me a prick,
well, I suppose I should go out and buy one :)
:-) You'll have to pardon my indiscriminate epithets. The barbs are coming
from multiple directions. My point still stands, however. Your experience,
however worthy, has zero bearing on whether or not my experience is
just as worthy. Even moreso when you guys have zero clue who you're talking
to. And you shouldn't have to know. I would have thought simple communal
and professional courtesy would be extended and all points considered in
earnest. Which does not appear to be the case.
And you're disregarding a considerable problem that rsync is a well-established
tool for mirroring that is easy to use and works on a very wide range of
platforms. Asking mirror ops to adopt a new tool for mirroring one mirror, when
they often have several or more, likely won't be met with much enthusiasm and
would create two tiers of CPAN mirrors, those using rsync and those not, which
would not only complicate something which should remain simple but, again,
doesn't address the size of the archive and the multitude of small files that
are always a consideration no matter what you're serving them up with.
Ah, you're one of them. All objects look like nails when all you have is a
hammer, eh? Rsync is a good tool, but like Perl, it isn't the perfect tool
for all tasks. You've obviously exceeded what the tool was designed for,
it's only logical to look for (or write) another tool. Ironically, what I'm
suggesting is so basic that rsync can be replaced by a script which will
likely run on every mirror out there with no more fuss than rsync.
FTP? It's 2010 and very few corp firewalls allow ftp in or out. I can't
remember the last time I even used ftp come to think of it. I had to go through
2 layers of network red tape just to get rsync for a particular system I wanted
to mirror CPAN to at work. Asking for FTP would have been met with a big no or
a cackle, depending on which of the nyetwork masters got the request first.
Sounds like you may be hamstrung by your own bureacracy, but that's rarely
the case in most the places I've worked. Not to mention that between
passive mode FTP or even using an HTTP proxy (most of which support FTP
requests) what I'm proposing is relatively painless, simple, and easy to
secure. This concern I suspect is a non-issue for most mirror operators.
Even if it was, allow them to pull it via HTTP for all I care. Either one
is significantly more efficient than rsync.
How is replacing rsync, a standard and widely used tool, simpler for mirror
ops? I suppose I don't understand the opposition to trimming off the obvious
cruft on CPAN to lighten the load when BackPAN exists to archive them. There is
already CPAN::Mini (which was created back when CPAN was an ever-so-tiny 1.2GB)
so it's not as though lightening the load is a new idea or an unwelcome one.
I'm not opposed to trimming the cruft, but I am opposed to ignorant
knee-jerk reactions bereft of any empirical data (or at least you haven't
shared). The cruft, while being cruft, isn't inherently evil. You have a
basic I/O and state problem. And the I/O generated is predominantly caused
by rsync trying to (re)assemble state on the file set, *per* request. More
appallingly, most of that state image being generated is state that hasn't
changed in quite awhile. Literally years in many cases. So why are we
wasting cycles & I/O performing massively redundant work?
That's why having PAUSE implement a transaction log, and perhaps a cron job
on the master server doing daily checkpointed file manifests is so much more
efficient. An in-sync mirror only needs to download the lastest transaction
logs and play them forward (delete certain files, download others, etc).
And, gee, just about every author on the list could write *that* sync agent
in an evening. Out-of-sync mirrors can start by working off the checkpoint
manifest, get what's missing, and rolling forward.
What you're overlooking is that CPAN has, and will, continue to grow. Even
if you remove the cruft now at some point it might grow to the same size
just with fresh files. When that happens, you're right back where you are
now. Rsync can't cut it, it wasn't designed for this.
Whether you like it or not, even on a pared down CPAN rsync is easily your
most inefficient process on the server. If you're not willing to optimize
that, then you really don't care about optimization at all.
--Arthur Corliss
Live Free or Die