Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Andy Armstrong
On 27 Mar 2010, at 00:59, Elaine Ashton wrote:
 The only snag I can forsee in trimming back on the abundance of modules is 
 the case where some modules have version requirements for other modules where 
 it will barf with a mismatch/newer version of the required module (I bumped 
 into this recently but can't remember exactly which module it was) but I 
 think it's rare and the practise should be discouraged.


Maybe that could be solved by having the clients (and maybe search.cpan.org) 
automagically fall back to a backpan mirror?

And, yes, if it's considered a good idea I /am/ prepared to do something about 
it.

-- 
Andy Armstrong, Hexten





Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread nadim khemir
 On 27 Mar 2010, at 00:59, Andy Armstrong wrote:

  On 27 Mar 2010, at 00:59, Elaine Ashton wrote:
  The only snag I can forsee in trimming back on the abundance of modules is 
the case where some modules have version requirements for other modules where 
it will barf with a mismatch/newer version of the required module (I bumped 
into this recently but can't remember exactly which module it was) but I think 
it's rare and the practise should be discouraged.
 
 
 Maybe that could be solved by having the clients (and maybe search.cpan.org) 
automagically fall back to a backpan mirror?
 
 And, yes, if it's considered a good idea I am prepared to do something about 
it.

Exactly what I wrote in my previous mail, nobody commented I was wondering if 
I was wrong!

In any case. We do now have a better understanding of the problem and most 
important we have a real user (Elaine) wishing for something to be done.

Andreas, Chris, Tatsuhiko and other have done a tremendous job implementing 
stuff but I must admit that I would have like to see a list of what they are 
implementing. Not to mention the need to see a context diagram. IMVHO the 
first thing we should do is have a requirement list of what CPAN actors 
(clients, pause, mirrors, search engines, ...) should do. Maybe that document 
already exists somewhere. 

What implication we may have on CPAN, ExtUtils, Module::Build, and all other , 
still unknown, modules are, I believe, not to be under estimated.

Andy (since you are the first to really volunteer (and now you don't have any 
choice anymore;)), count me in whatever development time is needed to get 
things moving. 

Ask, this thread is getting a tad long and although I'm very happy to see more 
input, requirements and ideas, Would it be possible to see a some condensed 
results somewhere?

Cheers, Nadim.



Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss

On Sat, 27 Mar 2010, Nicholas Clark wrote:


I

You?

Or someone else?


I am quite happy to agree that your understanding and experience of storage
management is better than mine. But that's not the key question, in a
volunteer organisation. The questions I ask, repeating Jan's comments in
another message, are.


Oh, I understand that fully.  And I'd be happy to lend some of my time.  But
you don't make people inclined to help when people are lobbing snarky
comments like we'll wait breathlessly for you to do it.  The impression
I'm getting from most of you right now is that you're hell bent on solving
the problem your way, and no one is interested in exploring the technical
merits of other approaches.

Hell, I would even help with work towards your desired method *if* I thought
that was the consensus after a genuine exchange and consideration of ideas.
I definitely won't should it appear that we have some kind of elitist cabal
that will make their decision in isolation.  If that's going to be the case
then this should have never been raised on an open forum like the module
author's list.

Quite frankly, at times some discussions on this list fail the concept of a
technical meritocracy, and tend towards an established aristocracy.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread David Cantrell
On Fri, Mar 26, 2010 at 03:02:22PM -0800, Arthur Corliss wrote:
 Why use rsync, then?  Why not have checkpointed logs on cpan with
 additions/removals logged by date so you can roll forward on the client,
 processing only those files?  It would be trivial to set up and a lot more
 efficient.

Because the most important mirror sites mirror CPAN as just a very small
part of what they do.  They won't want to have to use weird tools for
just that tiny corner of their disk.

-- 
David Cantrell | London Perl Mongers Deputy Chief Heretic

I caught myself pulling grey hairs out of my beard.
I'm definitely not going grey, but I am going vain.


Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Jarkko Hietaniemi
  Oh, I understand that fully.  And I'd be happy to lend some of my 
time.  But

you don't make people inclined to help when people are lobbing snarky
comments like we'll wait breathlessly for you to do it.


The time-honored tradition of many open source communities is to talk. 
And talk.  And talk.  The problem is that this solves nothing.  To do, does.


You are free to decide to take this as a personal insult.



Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss

On Sat, 27 Mar 2010, Jarkko Hietaniemi wrote:

The time-honored tradition of many open source communities is to talk. And 
talk.  And talk.  The problem is that this solves nothing.  To do, does.


You are free to decide to take this as a personal insult.


I didn't take it as an insult, I took it as what it was -- a dodge.  You
already have your minds made up and are not willing to evaluate options
on their merits.

Let's just be honest about what's going on here.

--Arthur Corliss
  Live Free or Die


Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss

On Sat, 27 Mar 2010, Elaine Ashton wrote:


Actually, I thought I was merely offering my opinion both as the sysadmin for 
the canonical CPAN mothership and as an end-user. If that makes me a prick, 
well, I suppose I should go out and buy one :)


:-) You'll have to pardon my indiscriminate epithets.  The barbs are coming
from multiple directions.  My point still stands, however.  Your experience,
however worthy, has zero bearing on whether or not my experience is
just as worthy.  Even moreso when you guys have zero clue who you're talking
to.  And you shouldn't have to know.  I would have thought simple communal 
and professional courtesy would be extended and all points considered in 
earnest.  Which does not appear to be the case.



And you're disregarding a considerable problem that rsync is a well-established 
tool for mirroring that is easy to use and works on a very wide range of 
platforms. Asking mirror ops to adopt a new tool for mirroring one mirror, when 
they often have several or more, likely won't be met with much enthusiasm and 
would create two tiers of CPAN mirrors, those using rsync and those not, which 
would not only complicate something which should remain simple but, again, 
doesn't address the size of the archive and the multitude of small files that 
are always a consideration no matter what you're serving them up with.


Ah, you're one of them.  All objects look like nails when all you have is a
hammer, eh?  Rsync is a good tool, but like Perl, it isn't the perfect tool
for all tasks.  You've obviously exceeded what the tool was designed for,
it's only logical to look for (or write) another tool.  Ironically, what I'm 
suggesting is so basic that rsync can be replaced by a script which will 
likely run on every mirror out there with no more fuss than rsync.



FTP? It's 2010 and very few corp firewalls allow ftp in or out. I can't 
remember the last time I even used ftp come to think of it. I had to go through 
2 layers of network red tape just to get rsync for a particular system I wanted 
to mirror CPAN to at work. Asking for FTP would have been met with a big no or 
a cackle, depending on which of the nyetwork masters got the request first.


Sounds like you may be hamstrung by your own bureacracy, but that's rarely
the case in most the places I've worked.  Not to mention that between
passive mode FTP or even using an HTTP proxy (most of which support FTP
requests) what I'm proposing is relatively painless, simple, and easy to
secure.  This concern I suspect is a non-issue for most mirror operators.
Even if it was, allow them to pull it via HTTP for all I care.  Either one
is significantly more efficient than rsync.


How is replacing rsync, a standard and widely used tool, simpler for mirror 
ops? I suppose I don't understand the opposition to trimming off the obvious 
cruft on CPAN to lighten the load when BackPAN exists to archive them. There is 
already CPAN::Mini (which was created back when CPAN was an ever-so-tiny 1.2GB) 
so it's not as though lightening the load is a new idea or an unwelcome one.


I'm not opposed to trimming the cruft, but I am opposed to ignorant
knee-jerk reactions bereft of any empirical data (or at least you haven't
shared).  The cruft, while being cruft, isn't inherently evil.  You have a
basic I/O and state problem.  And the I/O generated is predominantly caused 
by rsync trying to (re)assemble state on the file set, *per* request.  More

appallingly, most of that state image being generated is state that hasn't
changed in quite awhile.  Literally years in many cases.  So why are we
wasting cycles  I/O performing massively redundant work?

That's why having PAUSE implement a transaction log, and perhaps a cron job
on the master server doing daily checkpointed file manifests is so much more
efficient.  An in-sync mirror only needs to download the lastest transaction
logs and play them forward (delete certain files, download others, etc).
And, gee, just about every author on the list could write *that* sync agent
in an evening.  Out-of-sync mirrors can start by working off the checkpoint
manifest, get what's missing, and rolling forward.

What you're overlooking is that CPAN has, and will, continue to grow.  Even 
if you remove the cruft now at some point it might grow to the same size 
just with fresh files.  When that happens, you're right back where you are 
now.  Rsync can't cut it, it wasn't designed for this.


Whether you like it or not, even on a pared down CPAN rsync is easily your
most inefficient process on the server.  If you're not willing to optimize
that, then you really don't care about optimization at all.

--Arthur Corliss
  Live Free or Die