Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Nicholas Clark
On Sat, Mar 27, 2010 at 10:52:05AM -0800, Arthur Corliss wrote:

 I think I was quite explicit in saying that efficiencies should be pursued
 in multiple areas, but the predominant bitch I took away from your thread
 dealt with the burden of synchronizing mirrors.  What's the easiest way to
 address that pain?  I don't believe it's your method.  I'd look into the
 size issue *after* you address the incredible inefficiencies of a simple
 rsync.

I

You?

Or someone else?


I am quite happy to agree that your understanding and experience of storage
management is better than mine. But that's not the key question, in a
volunteer organisation. The questions I ask, repeating Jan's comments in
another message, are.

Nicholas Clark


Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Elaine Ashton

On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote:
 
 :-) You'll have to pardon my indiscriminate epithets.  The barbs are coming
 from multiple directions.  My point still stands, however.  Your experience,
 however worthy, has zero bearing on whether or not my experience is
 just as worthy.  Even moreso when you guys have zero clue who you're talking
 to.  And you shouldn't have to know.  I would have thought simple communal 
 and professional courtesy would be extended and all points considered in 
 earnest.  Which does not appear to be the case.

I'm not sending any barbs, only my reasonable opinion borne from years on the 
reality-based operations side of this equation. As for who you are, it doesn't 
matter as I work daily with those who wrote, and continue to write, large 
chunks of operating systems, X, etc., and though their legend may precede them 
when it comes to my having to implement what works fabulously in their 
imagination, I do my best to bring them back to the grim reality that is 
operations. It's a frequent problem of engineers and those of us stuck having 
to live with and fix their grand ideas. Lofty goals usually die somewhere 
between dreams and production. 

 Ah, you're one of them.  All objects look like nails when all you have is a
 hammer, eh?  Rsync is a good tool, but like Perl, it isn't the perfect tool
 for all tasks.  You've obviously exceeded what the tool was designed for,
 it's only logical to look for (or write) another tool.  Ironically, what I'm 
 suggesting is so basic that rsync can be replaced by a script which will 
 likely run on every mirror out there with no more fuss than rsync.

Well, you'll have to forgive those who mock your näivete as if it were so basic 
and trivial to replace rsync, it would have been done several times over by now 
as it's limitations are well known to all who use it on any large scale. 
However, it is a well-known, well-used, multi-platform and time-tested tool 
that will not be unseated very easily without good reason and a reason that 
reads something along the lines of improving performance on an archive that 
should have been trimmed back a bit is not a compelling reason for adoption. 

 What you're overlooking is that CPAN has, and will, continue to grow.  Even 
 if you remove the cruft now at some point it might grow to the same size just 
 with fresh files.  When that happens, you're right back where you are now.  
 Rsync can't cut it, it wasn't designed for this.

And this is a good point to make, yes, it will continue to grow and I know that 
the current manager(s) of nic.funet.fi have commented on the burden it presents 
to the system which is also home to a number of other mirrors. You cannot 
assume that the generosity and the resources of the mirror ops are limitless 
and finding out where that limit lies will come too late to make amends. 

Pruning back the archive is a good compromise until and unless another solution 
can be done that will not bother the mirror ops terribly much in terms of real 
work.

e.

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Nicholas Clark
On Sat, Mar 27, 2010 at 08:52:22PM -0800, Arthur Corliss wrote:
 On Sat, 27 Mar 2010, Elaine Ashton wrote:
 
 Actually, I thought I was merely offering my opinion both as the sysadmin 
 for the canonical CPAN mothership and as an end-user. If that makes me a 
 prick, well, I suppose I should go out and buy one :)
 
 :-) You'll have to pardon my indiscriminate epithets.  The barbs are coming
 from multiple directions.  My point still stands, however.  Your experience,
 however worthy, has zero bearing on whether or not my experience is
 just as worthy.  Even moreso when you guys have zero clue who you're talking

Are you running a large public mirror site, where you don't even have
knowledge of who is mirroring from you?

(Not even knowledge, let alone channels of communication with, let alone
control over)

Because (as I see it, not having done any of this) the logistics of that is
going to have as much bearing on trying to change protocols as the actual
technical merits of the protocol itself.

Most of the cost of rsync is an externality to the clients. If one has an
existing mirror, one is using rsync to keep it up to date, what's the
incentive to change?

 Sounds like you may be hamstrung by your own bureacracy, but that's rarely
 the case in most the places I've worked.  Not to mention that between
 passive mode FTP or even using an HTTP proxy (most of which support FTP
 requests) what I'm proposing is relatively painless, simple, and easy to
 secure.  This concern I suspect is a non-issue for most mirror operators.
 Even if it was, allow them to pull it via HTTP for all I care.  Either one
 is significantly more efficient than rsync.

I'm missing something here, I suspect. How can HTTP be more efficient than
rsync? The only obvious method to me of mirroring a CPAN site by HTTP is to
instruct a client (such as wget) to get it all. In which case, in the course
of doing this the client is going to recurse over the entire directory tree
of the server, which, I thought, was functionally equivalent to the behaviour
of the rsync server.

Nicholas Clark


Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Randy Kobes
On 2010-03-28, at 9:13 AM, Elaine Ashton wrote:

 On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote:
 
 What you're overlooking is that CPAN has, and will, continue to grow.  Even 
 if you remove the cruft now at some point it might grow to the same size 
 just with fresh files.  When that happens, you're right back where you are 
 now.  Rsync can't cut it, it wasn't designed for this.
 
 And this is a good point to make, yes, it will continue to grow and I know 
 that the current manager(s) of nic.funet.fi have commented on the burden it 
 presents to the system which is also home to a number of other mirrors. You 
 cannot assume that the generosity and the resources of the mirror ops are 
 limitless and finding out where that limit lies will come too late to make 
 amends. 
 
 Pruning back the archive is a good compromise until and unless another 
 solution can be done that will not bother the mirror ops terribly much in 
 terms of real work.
 
 e.

Has some sort of disk quota system for CPAN author accounts ever been 
considered?

-- 
best regards,
Randy



Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Arthur Corliss

On Sun, 28 Mar 2010, Ask Bj?rn Hansen wrote:


You are misunderstanding the problem of changing the mirroring mechanism.


I am not misunderstanding, I'm just willing to accept the reality for what
it is.  Rsync does not scale.  Period.


Making new software is nice and good -- Andreas already has something that's 
better for the PAUSE data.


G  That makes my point all the more compelling, then.  Some of the work
has already been done.


Getting 1000s of mirrors to use your software (rather than rsync which they use 
for ALL OTHER mirrors -- not so easy.


Perhaps, but it's also possible that it might not be as bad as you think,
either.  You have a strong case to be made that the entire ecosystem
benefits from making this change (particularly in a tiered mirroring
environment), and I'd be surprised if the majority of the mirror operators 
aren't sympathetic and cooperative.  As a sys-admin I watch my SAR reports

like a hawk, I'm sure they're no different.

And that's not to say you have to eliminate rsync.  If you can get half of
them to stop, you'll still have some significant long term gains.

--Arthur Corliss
  Live Free or Die

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Arthur Corliss

On Sun, 28 Mar 2010, Elaine Ashton wrote:


I'm not sending any barbs, only my reasonable opinion borne from years on the 
reality-based operations side of this equation. As for who you are, it doesn't 
matter as I work daily with those who wrote, and continue to write, large 
chunks of operating systems, X, etc., and though their legend may precede them 
when it comes to my having to implement what works fabulously in their 
imagination, I do my best to bring them back to the grim reality that is 
operations. It's a frequent problem of engineers and those of us stuck having 
to live with and fix their grand ideas. Lofty goals usually die somewhere 
between dreams and production.


Ah, let the chest thumping begin.  My point is that regardless of where the 
idea comes from if it comes from a solid rationale it should be given 
consideration.  And to date I have yet to see any one of you refute my 
technical understanding of the problem, only my political understanding of 
the problem.  I/O is the issue, and it is driven predominantly by rsync.



Well, you'll have to forgive those who mock your n?ivete as if it were so basic 
and trivial to replace rsync, it would have been done several times over by now 
as it's limitations are well known to all who use it on any large scale. 
However, it is a well-known, well-used, multi-platform and time-tested tool 
that will not be unseated very easily without good reason and a reason that 
reads something along the lines of improving performance on an archive that 
should have been trimmed back a bit is not a compelling reason for adoption.


Naivete?  Again:  show me where my assertions about the primary root of your
problem is incorrect?  Show me how pruning CPAN isn't a temporary band-aid
that fails to address a fundamental weakness in the syncing process?  you
haven't.  You can try to dress it up any way you like in effort to discredit
me, but until you do based on the facts, you have nothing.

Rsync is a good tool, but for different use case scenarios.


And this is a good point to make, yes, it will continue to grow and I know that 
the current manager(s) of nic.funet.fi have commented on the burden it presents 
to the system which is also home to a number of other mirrors. You cannot 
assume that the generosity and the resources of the mirror ops are limitless 
and finding out where that limit lies will come too late to make amends.


G And you make my point for me.  I'm sure he would love to find a more
efficient use of his I/O.  I assume nothing, I only allow that you'll find
more interest than you assume in managing I/O.  Nor does what I'm proposing
preclude the intractable from continuing to use rsync.  Given that rsync is
your driver of the I/O problem taking away any significant percentage of the
problem with have the largest dividends.


Pruning back the archive is a good compromise until and unless another solution 
can be done that will not bother the mirror ops terribly much in terms of real 
work.


At least you admit you're only treating the symptoms now, not the disease
itself.  Sure, it will buy you some time, but there'll also be some
political problems to work through which will likely burn as much if not
more manhours than just treating the disease.  And in the end time runs
out and the problem remains.

Look, I don't care if you guys decide against it, but let's be honest about
the compromises you're making.  Hell, pruning isn't even a compromise, it's
not a solution, it's only a delaying tactic.

--Arthur Corliss
  Live Free or Die

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Elaine Ashton

On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:

 
 Has some sort of disk quota system for CPAN author accounts ever been 
 considered?

Not specifically, no, at least not that I'm aware of. That would have to be 
implemented on PAUSE and quotas frequently end up not solving the real problem 
and create a headache both for the sysadmin and the users. 

Jarkko and I were talking about it this morning - as he's not in favour of 
pruning - while trying to think of a way around the size problem and he 
reminded me of the idea that, if I recall correctly was Adreas' suggestion a 
while back, there be an A, B and C 'PAN' of sorts where you could pull varying 
degrees of content - sort of CPAN:Mini writ large. I don't think that idea ever 
got any traction because it wouldn't really solve some of the issues for the 
major upstream mirrors and the mechanics of deciding where to draw the lines 
between them. I still think it's a good idea though.

I do very much like Tim's proposal for giving old modules a push to BackPAN 
since, with proper communication of the changes to the authors along with a way 
to mark exceptions, this would rid CPAN of a lot of cruft that should be on 
BackPan anyway.

e.


Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Andy Armstrong
On 28 Mar 2010, at 19:32, Elaine Ashton wrote:
 Jarkko and I were talking about it this morning - as he's not in favour of 
 pruning - while trying to think of a way around the size problem and he 
 reminded me of the idea that, if I recall correctly was Adreas' suggestion a 
 while back, there be an A, B and C 'PAN' of sorts where you could pull 
 varying degrees of content - sort of CPAN:Mini writ large. I don't think that 
 idea ever got any traction because it wouldn't really solve some of the 
 issues for the major upstream mirrors and the mechanics of deciding where to 
 draw the lines between them. I still think it's a good idea though.

We're nearly there if A == a CPAN::Mini style mirror, B == the current mirror 
pruned and C == backpan.

So the actions to make that happen are:

* give the current clients specific support for this
* generate a master mini mirror that other mini mirrors can pull from.
* prune

If we agree that this is a good solution I'm happy to do some work on it - I 
could host the mini master and I'd be happy to send Andreas a patch for CPAN.pm 
to support this scheme.

-- 
Andy Armstrong, Hexten