Re: Trimming the CPAN - Automatic Purging

2010-04-04 Thread David Nicol
It hasn't been done because its outside of the scope of design for rsync. It's meant to sync arbitrary filesets in which many, if not all, changes are made out of band.  It's decidely non-trivial to implement in that mode unless you're willing to accept a certain window in which your database

Re: Trimming the CPAN - Automatic Purging

2010-04-02 Thread Ask Bjørn Hansen
On Apr 2, 2010, at 1:50, Arthur Corliss wrote: And my assertion has been that the excessive stats by the server are a bigger impediment to synchronization than the inode count. Well, then one of us don't understand how file systems etc work. :-) - ask

Re: Trimming the CPAN - Automatic Purging

2010-04-02 Thread Arthur Corliss
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote: On Apr 2, 2010, at 1:50, Arthur Corliss wrote: And my assertion has been that the excessive stats by the server are a bigger impediment to synchronization than the inode count. Well, then one of us don't understand how file systems etc work. :-)

Re: Trimming the CPAN - Automatic Purging

2010-04-01 Thread David Precious
On Thursday 01 April 2010 05:39:27 David Nicol wrote: On Wed, Mar 31, 2010 at 7:43 AM, Ask Bjørn Hansen a...@perl.org wrote: The main point here is that we can't use 20 inodes per distribution. so don't. How much reengineering would be needed to keep CPAN in a database instead of a file

Re: Trimming the CPAN - Automatic Purging

2010-04-01 Thread Eakin, Lee
Much of this discussion is beyond my depth but in terms of keeping it simple, and trying to limit the stat calls on the upstream servers, what about DNS as a replication model? You could break up the tree at logical divisions similar to zones and assign them serial numbers (say a .serial file)

Re: Trimming the CPAN - Automatic Purging

2010-04-01 Thread Arthur Corliss
On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote: I can't believe I'm doing this, but ... :-) All for entertainment's sake... The main point here is that we can't use 20 inodes per distribution. It's Just Nuts. Sure, it's only something like 400k files/inodes now - but at the rate it's going

Re: Trimming the CPAN - Automatic Purging

2010-03-31 Thread Nicholas Clark
On Wed, Mar 31, 2010 at 01:03:51PM +1100, Adam Kennedy wrote: I've said nothing till now, because I figured more noise wouldn't help much. But I quite like the rsync daemon/proxy idea, and as it so happens I'm attending the OzLabs Unconference in 3 weeks time to hang out with Tridge, Rusty

Re: Trimming the CPAN - Automatic Purging

2010-03-31 Thread Ask Bjørn Hansen
On Mar 31, 2010, at 6:52, David Nicol wrote: new proposal: Make modules pay rent in order to remain on a mirror. Rent could be in the form of actual user interest, or good reviews. How you are proposing purging useless stuff from CPAN -- that's a lot more radical than Tim's proposal of just

Re: Trimming the CPAN - Automatic Purging

2010-03-31 Thread David Nicol
On Wed, Mar 31, 2010 at 7:43 AM, Ask Bjørn Hansen a...@perl.org wrote: The main point here is that we can't use 20 inodes per distribution. so don't. How much reengineering would be needed to keep CPAN in a database instead of a file system?

Re: Trimming the CPAN - Automatic Purging

2010-03-30 Thread David Cantrell
On Sun, Mar 28, 2010 at 11:48:00AM -0500, Randy Kobes wrote: Has some sort of disk quota system for CPAN author accounts ever been considered? There are authors with 100 distributions. There are authors with just one distribution. There are authors with big distributions, and authors with

Re: Trimming the CPAN - Automatic Purging

2010-03-30 Thread David Cantrell
On Mon, Mar 29, 2010 at 12:02:11AM -0800, Arthur Corliss wrote: I think it would be a worthy cause ultimately, but certainly a much longer time to implementation, and considerably more effort. Kind of sounds like the normal stonewalling I've been getting these last few days by our resident

Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Steffen Mueller
Hi Elaine, Elaine Ashton wrote: On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote: Jarkko and I were talking about it this morning - as he's not in favour of pruning - while trying to think of a way around the size problem and he reminded me of the idea that, if I recall correctly was Adreas'

Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread Arthur Corliss
On Sun, 28 Mar 2010, Andy Armstrong wrote: We're nearly there if A == a CPAN::Mini style mirror, B == the current mirror pruned and C == backpan. So the actions to make that happen are: * give the current clients specific support for this * generate a master mini mirror that other mini

Re: Trimming the CPAN - Automatic Purging

2010-03-29 Thread David Cantrell
On Sat, Mar 27, 2010 at 09:38:16PM -0400, Elaine Ashton wrote: I suppose I don't understand the opposition to trimming off the obvious cruft on CPAN to lighten the load when BackPAN exists to archive them. There is already CPAN::Mini (which was created back when CPAN was an ever-so-tiny

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Nicholas Clark
On Sat, Mar 27, 2010 at 10:52:05AM -0800, Arthur Corliss wrote: I think I was quite explicit in saying that efficiencies should be pursued in multiple areas, but the predominant bitch I took away from your thread dealt with the burden of synchronizing mirrors. What's the easiest way to

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Elaine Ashton
On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote: :-) You'll have to pardon my indiscriminate epithets. The barbs are coming from multiple directions. My point still stands, however. Your experience, however worthy, has zero bearing on whether or not my experience is just as worthy.

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Nicholas Clark
On Sat, Mar 27, 2010 at 08:52:22PM -0800, Arthur Corliss wrote: On Sat, 27 Mar 2010, Elaine Ashton wrote: Actually, I thought I was merely offering my opinion both as the sysadmin for the canonical CPAN mothership and as an end-user. If that makes me a prick, well, I suppose I should go

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Randy Kobes
On 2010-03-28, at 9:13 AM, Elaine Ashton wrote: On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote: What you're overlooking is that CPAN has, and will, continue to grow. Even if you remove the cruft now at some point it might grow to the same size just with fresh files. When that

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Arthur Corliss
On Sun, 28 Mar 2010, Ask Bj?rn Hansen wrote: You are misunderstanding the problem of changing the mirroring mechanism. I am not misunderstanding, I'm just willing to accept the reality for what it is. Rsync does not scale. Period. Making new software is nice and good -- Andreas already

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Arthur Corliss
On Sun, 28 Mar 2010, Elaine Ashton wrote: I'm not sending any barbs, only my reasonable opinion borne from years on the reality-based operations side of this equation. As for who you are, it doesn't matter as I work daily with those who wrote, and continue to write, large chunks of operating

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Elaine Ashton
On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote: Has some sort of disk quota system for CPAN author accounts ever been considered? Not specifically, no, at least not that I'm aware of. That would have to be implemented on PAUSE and quotas frequently end up not solving the real problem

Re: Trimming the CPAN - Automatic Purging

2010-03-28 Thread Andy Armstrong
On 28 Mar 2010, at 19:32, Elaine Ashton wrote: Jarkko and I were talking about it this morning - as he's not in favour of pruning - while trying to think of a way around the size problem and he reminded me of the idea that, if I recall correctly was Adreas' suggestion a while back, there be

Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Andy Armstrong
On 27 Mar 2010, at 00:59, Elaine Ashton wrote: The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version of the required module (I bumped into this

Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread nadim khemir
On 27 Mar 2010, at 00:59, Andy Armstrong wrote: On 27 Mar 2010, at 00:59, Elaine Ashton wrote: The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version

Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss
On Sat, 27 Mar 2010, Nicholas Clark wrote: I You? Or someone else? I am quite happy to agree that your understanding and experience of storage management is better than mine. But that's not the key question, in a volunteer organisation. The questions I ask, repeating Jan's comments in

Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread David Cantrell
On Fri, Mar 26, 2010 at 03:02:22PM -0800, Arthur Corliss wrote: Why use rsync, then? Why not have checkpointed logs on cpan with additions/removals logged by date so you can roll forward on the client, processing only those files? It would be trivial to set up and a lot more efficient.

Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Jarkko Hietaniemi
Oh, I understand that fully. And I'd be happy to lend some of my time. But you don't make people inclined to help when people are lobbing snarky comments like we'll wait breathlessly for you to do it. The time-honored tradition of many open source communities is to talk. And talk. And

Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss
On Sat, 27 Mar 2010, Jarkko Hietaniemi wrote: The time-honored tradition of many open source communities is to talk. And talk. And talk. The problem is that this solves nothing. To do, does. You are free to decide to take this as a personal insult. I didn't take it as an insult, I took

Re: Trimming the CPAN - Automatic Purging

2010-03-27 Thread Arthur Corliss
On Sat, 27 Mar 2010, Elaine Ashton wrote: Actually, I thought I was merely offering my opinion both as the sysadmin for the canonical CPAN mothership and as an end-user. If that makes me a prick, well, I suppose I should go out and buy one :) :-) You'll have to pardon my indiscriminate

Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Andy Lester
On Mar 26, 2010, at 4:55 AM, Lars Thegler wrote: I appreciate that the number of files on CPAN has implications for the infrastructure, but I feel a need to have some more factual info before conceding to such measures. Absolutely. This factual info would ideally look like this: Of the

Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Arthur Corliss
On Fri, 26 Mar 2010, Andy Lester wrote: Absolutely. This factual info would ideally look like this: Of the 17,000 distros on CPAN, there are 8,000 that have versions more than a year older than the most recent one. If those distros with versions more than a year out of date were purged,

Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Jarkko Hietaniemi
On Friday-201003-26 13:20, Arthur Corliss wrote: On Fri, 26 Mar 2010, Andy Lester wrote: Absolutely. This factual info would ideally look like this: Of the 17,000 distros on CPAN, there are 8,000 that have versions more than a year older than the most recent one. If those distros with

Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Jarkko Hietaniemi
On Friday-201003-26 19:02, Arthur Corliss wrote: On Fri, 26 Mar 2010, Jarkko Hietaniemi wrote: The total size is not the problem. The number of files is. Vanilla rsync is horribly inefficient (not the protocol, which is genius, mind) because a client coming by and asking for updates

Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Ask Bjørn Hansen
On Mar 26, 2010, at 16:02, Arthur Corliss wrote: Why use rsync, then? Why not have checkpointed logs on cpan with additions/removals logged by date so you can roll forward on the client, processing only those files? It would be trivial to set up and a lot more efficient. I find it

Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Arthur Corliss
On Fri, 26 Mar 2010, Ask Bj?rn Hansen wrote: I find it curious that everyone who's actually involved in syncing the files or running mirror servers seem to think it generally sounds like a good idea and everyone who doesn't say it's not worth the effort. Sure, I don't run a CPAN mirror, but

RE: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Jan Dubois
On Fri, 26 Mar 2010, Arthur Corliss wrote: But what the hell do I know. I don't run a *CPAN* mirror, so I must be freaking clueless... It's not about what you know, but about what you are willing to do yourself. At some point you have to accept that the people who *do* the work decide *how*

Re: Trimming the CPAN - Automatic Purging

2010-03-26 Thread Elaine Ashton
On Mar 26, 2010, at 8:23 PM, Arthur Corliss wrote: Sure, I don't run a CPAN mirror, but I do manage many, many terrabytes of storage as part of my day job. I think it's a tad presumptuous to disregard input just because we're not in your inner sanctum. As I mentioned in a follow up

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Ovid
--- On Thu, 25/3/10, David Golden xda...@gmail.com wrote: From: David Golden xda...@gmail.com I don't think it's a good idea to make it hard for people to find older versions of a distribution -- where hard means have to track it down on backpan.  (Though we could make clients better

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Barbie
On Thu, Mar 25, 2010 at 11:12:32AM +, Tim Bunce wrote: Currently on PAUSE you have to explicitly delete old uploads. Which often is a good thing. While BACKPAN exists, it isn't somewhere that many go to look for old distributions. For me and probably others, BACKPAN only distributions are

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Graham Barr
On Mar 25, 2010, at 8:42 AM, Barbie wrote: Lastly I would also personnally be annoyed if only the latest versions were available, as I often make great use of the diff tool on search.cpan.org. Having only the latest version renders that great tool redundant :( I use that too :-) and it is

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread David Cantrell
On Thu, Mar 25, 2010 at 01:42:58PM +, Barbie wrote: There are many distributions on CPAN that older versions work on a particular perl/os, but more recent ones don't. Latest isn't necessarily the greatest. If you are going to perform this then it should really feed off the CPAN Testers

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Chris Nandor
What Jarkko said. On Mar 25, 2010, at 08:00, Jarkko Hietaniemi wrote: I have one case where the v1 and v2 of a module are simply incompatible, but v1 still works, and unless the users have a compelling reason, they won't migrate. Pulling the rug from under them would be quite

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Andy Armstrong
On 25 Mar 2010, at 15:36, Chris Nandor wrote: I like that solution better [snip] But solution to what? Are we convinced there's actually a problem here? -- Andy Armstrong, Hexten

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Ask Bjørn Hansen
On Mar 25, 2010, at 8:38, Andy Armstrong wrote: I like that solution better [snip] But solution to what? Are we convinced there's actually a problem here? CPAN has almost 200k files. www.cpan.org says there are 17627 modules. rsyncing a gazillion files doesn't work that well (on the

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread nadim khemir
On Mar 25, 2010, at 8:38, Andy Armstrong wrote: I like that solution better [snip] But solution to what? Are we convinced there's actually a problem here? CPAN has almost 200k files. www.cpan.org says there are 17627 modules. rsyncing a gazillion files doesn't work

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Barbie
On Thu, Mar 25, 2010 at 02:08:45PM -0700, Geoffrey Broadwell wrote: Forgive a lurker, but wasn't that the point of this: http://search.cpan.org/~andk/File-Rsync-Mirror-Recent-0.0.7/ When I saw that announced, I remember thinking Yay, large archive rsync problem solved! Did it not

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Ask Bjørn Hansen
On Mar 25, 2010, at 13:23, Eric Wilhelm wrote: Maybe CPAN mirrors are more easily updated than via a generic rsync? Is the burden only network/cpu for checking whether a bunch of old archives have changed, or does disk matter? Most CPAN mirrors use rsync. It's not realistic to make them

Re: Trimming the CPAN - Automatic Purging

2010-03-25 Thread Adam Kennedy
What he said. Most people don't mirror CPAN. They mirror many things. This is the same reason we've struggled with statistics. How do you ask someone mirroring three dozen different things to put in a special log-munging tool just for us. Adam K On Fri, Mar 26, 2010 at 10:55 AM, Ask Bjørn