Re: Trimming the CPAN - Automatic Purging
On 27 Mar 2010, at 00:59, Elaine Ashton wrote: The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version of the required module (I bumped into this recently but can't remember exactly which module it was) but I think it's rare and the practise should be discouraged. Maybe that could be solved by having the clients (and maybe search.cpan.org) automagically fall back to a backpan mirror? And, yes, if it's considered a good idea I /am/ prepared to do something about it. -- Andy Armstrong, Hexten
Re: Trimming the CPAN - Automatic Purging
On 27 Mar 2010, at 00:59, Andy Armstrong wrote: On 27 Mar 2010, at 00:59, Elaine Ashton wrote: The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version of the required module (I bumped into this recently but can't remember exactly which module it was) but I think it's rare and the practise should be discouraged. Maybe that could be solved by having the clients (and maybe search.cpan.org) automagically fall back to a backpan mirror? And, yes, if it's considered a good idea I am prepared to do something about it. Exactly what I wrote in my previous mail, nobody commented I was wondering if I was wrong! In any case. We do now have a better understanding of the problem and most important we have a real user (Elaine) wishing for something to be done. Andreas, Chris, Tatsuhiko and other have done a tremendous job implementing stuff but I must admit that I would have like to see a list of what they are implementing. Not to mention the need to see a context diagram. IMVHO the first thing we should do is have a requirement list of what CPAN actors (clients, pause, mirrors, search engines, ...) should do. Maybe that document already exists somewhere. What implication we may have on CPAN, ExtUtils, Module::Build, and all other , still unknown, modules are, I believe, not to be under estimated. Andy (since you are the first to really volunteer (and now you don't have any choice anymore;)), count me in whatever development time is needed to get things moving. Ask, this thread is getting a tad long and although I'm very happy to see more input, requirements and ideas, Would it be possible to see a some condensed results somewhere? Cheers, Nadim.
Re: Trimming the CPAN - Automatic Purging
On Sat, 27 Mar 2010, Nicholas Clark wrote: I You? Or someone else? I am quite happy to agree that your understanding and experience of storage management is better than mine. But that's not the key question, in a volunteer organisation. The questions I ask, repeating Jan's comments in another message, are. Oh, I understand that fully. And I'd be happy to lend some of my time. But you don't make people inclined to help when people are lobbing snarky comments like we'll wait breathlessly for you to do it. The impression I'm getting from most of you right now is that you're hell bent on solving the problem your way, and no one is interested in exploring the technical merits of other approaches. Hell, I would even help with work towards your desired method *if* I thought that was the consensus after a genuine exchange and consideration of ideas. I definitely won't should it appear that we have some kind of elitist cabal that will make their decision in isolation. If that's going to be the case then this should have never been raised on an open forum like the module author's list. Quite frankly, at times some discussions on this list fail the concept of a technical meritocracy, and tend towards an established aristocracy. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Fri, Mar 26, 2010 at 03:02:22PM -0800, Arthur Corliss wrote: Why use rsync, then? Why not have checkpointed logs on cpan with additions/removals logged by date so you can roll forward on the client, processing only those files? It would be trivial to set up and a lot more efficient. Because the most important mirror sites mirror CPAN as just a very small part of what they do. They won't want to have to use weird tools for just that tiny corner of their disk. -- David Cantrell | London Perl Mongers Deputy Chief Heretic I caught myself pulling grey hairs out of my beard. I'm definitely not going grey, but I am going vain.
Re: Trimming the CPAN - Automatic Purging
Oh, I understand that fully. And I'd be happy to lend some of my time. But you don't make people inclined to help when people are lobbing snarky comments like we'll wait breathlessly for you to do it. The time-honored tradition of many open source communities is to talk. And talk. And talk. The problem is that this solves nothing. To do, does. You are free to decide to take this as a personal insult.
Re: Trimming the CPAN - Automatic Purging
On Sat, 27 Mar 2010, Jarkko Hietaniemi wrote: The time-honored tradition of many open source communities is to talk. And talk. And talk. The problem is that this solves nothing. To do, does. You are free to decide to take this as a personal insult. I didn't take it as an insult, I took it as what it was -- a dodge. You already have your minds made up and are not willing to evaluate options on their merits. Let's just be honest about what's going on here. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sat, 27 Mar 2010, Elaine Ashton wrote: Actually, I thought I was merely offering my opinion both as the sysadmin for the canonical CPAN mothership and as an end-user. If that makes me a prick, well, I suppose I should go out and buy one :) :-) You'll have to pardon my indiscriminate epithets. The barbs are coming from multiple directions. My point still stands, however. Your experience, however worthy, has zero bearing on whether or not my experience is just as worthy. Even moreso when you guys have zero clue who you're talking to. And you shouldn't have to know. I would have thought simple communal and professional courtesy would be extended and all points considered in earnest. Which does not appear to be the case. And you're disregarding a considerable problem that rsync is a well-established tool for mirroring that is easy to use and works on a very wide range of platforms. Asking mirror ops to adopt a new tool for mirroring one mirror, when they often have several or more, likely won't be met with much enthusiasm and would create two tiers of CPAN mirrors, those using rsync and those not, which would not only complicate something which should remain simple but, again, doesn't address the size of the archive and the multitude of small files that are always a consideration no matter what you're serving them up with. Ah, you're one of them. All objects look like nails when all you have is a hammer, eh? Rsync is a good tool, but like Perl, it isn't the perfect tool for all tasks. You've obviously exceeded what the tool was designed for, it's only logical to look for (or write) another tool. Ironically, what I'm suggesting is so basic that rsync can be replaced by a script which will likely run on every mirror out there with no more fuss than rsync. FTP? It's 2010 and very few corp firewalls allow ftp in or out. I can't remember the last time I even used ftp come to think of it. I had to go through 2 layers of network red tape just to get rsync for a particular system I wanted to mirror CPAN to at work. Asking for FTP would have been met with a big no or a cackle, depending on which of the nyetwork masters got the request first. Sounds like you may be hamstrung by your own bureacracy, but that's rarely the case in most the places I've worked. Not to mention that between passive mode FTP or even using an HTTP proxy (most of which support FTP requests) what I'm proposing is relatively painless, simple, and easy to secure. This concern I suspect is a non-issue for most mirror operators. Even if it was, allow them to pull it via HTTP for all I care. Either one is significantly more efficient than rsync. How is replacing rsync, a standard and widely used tool, simpler for mirror ops? I suppose I don't understand the opposition to trimming off the obvious cruft on CPAN to lighten the load when BackPAN exists to archive them. There is already CPAN::Mini (which was created back when CPAN was an ever-so-tiny 1.2GB) so it's not as though lightening the load is a new idea or an unwelcome one. I'm not opposed to trimming the cruft, but I am opposed to ignorant knee-jerk reactions bereft of any empirical data (or at least you haven't shared). The cruft, while being cruft, isn't inherently evil. You have a basic I/O and state problem. And the I/O generated is predominantly caused by rsync trying to (re)assemble state on the file set, *per* request. More appallingly, most of that state image being generated is state that hasn't changed in quite awhile. Literally years in many cases. So why are we wasting cycles I/O performing massively redundant work? That's why having PAUSE implement a transaction log, and perhaps a cron job on the master server doing daily checkpointed file manifests is so much more efficient. An in-sync mirror only needs to download the lastest transaction logs and play them forward (delete certain files, download others, etc). And, gee, just about every author on the list could write *that* sync agent in an evening. Out-of-sync mirrors can start by working off the checkpoint manifest, get what's missing, and rolling forward. What you're overlooking is that CPAN has, and will, continue to grow. Even if you remove the cruft now at some point it might grow to the same size just with fresh files. When that happens, you're right back where you are now. Rsync can't cut it, it wasn't designed for this. Whether you like it or not, even on a pared down CPAN rsync is easily your most inefficient process on the server. If you're not willing to optimize that, then you really don't care about optimization at all. --Arthur Corliss Live Free or Die