Re: Trimming the CPAN - Automatic Purging
On Sat, Mar 27, 2010 at 10:52:05AM -0800, Arthur Corliss wrote: I think I was quite explicit in saying that efficiencies should be pursued in multiple areas, but the predominant bitch I took away from your thread dealt with the burden of synchronizing mirrors. What's the easiest way to address that pain? I don't believe it's your method. I'd look into the size issue *after* you address the incredible inefficiencies of a simple rsync. I You? Or someone else? I am quite happy to agree that your understanding and experience of storage management is better than mine. But that's not the key question, in a volunteer organisation. The questions I ask, repeating Jan's comments in another message, are. Nicholas Clark
Re: Trimming the CPAN - Automatic Purging
On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote: :-) You'll have to pardon my indiscriminate epithets. The barbs are coming from multiple directions. My point still stands, however. Your experience, however worthy, has zero bearing on whether or not my experience is just as worthy. Even moreso when you guys have zero clue who you're talking to. And you shouldn't have to know. I would have thought simple communal and professional courtesy would be extended and all points considered in earnest. Which does not appear to be the case. I'm not sending any barbs, only my reasonable opinion borne from years on the reality-based operations side of this equation. As for who you are, it doesn't matter as I work daily with those who wrote, and continue to write, large chunks of operating systems, X, etc., and though their legend may precede them when it comes to my having to implement what works fabulously in their imagination, I do my best to bring them back to the grim reality that is operations. It's a frequent problem of engineers and those of us stuck having to live with and fix their grand ideas. Lofty goals usually die somewhere between dreams and production. Ah, you're one of them. All objects look like nails when all you have is a hammer, eh? Rsync is a good tool, but like Perl, it isn't the perfect tool for all tasks. You've obviously exceeded what the tool was designed for, it's only logical to look for (or write) another tool. Ironically, what I'm suggesting is so basic that rsync can be replaced by a script which will likely run on every mirror out there with no more fuss than rsync. Well, you'll have to forgive those who mock your näivete as if it were so basic and trivial to replace rsync, it would have been done several times over by now as it's limitations are well known to all who use it on any large scale. However, it is a well-known, well-used, multi-platform and time-tested tool that will not be unseated very easily without good reason and a reason that reads something along the lines of improving performance on an archive that should have been trimmed back a bit is not a compelling reason for adoption. What you're overlooking is that CPAN has, and will, continue to grow. Even if you remove the cruft now at some point it might grow to the same size just with fresh files. When that happens, you're right back where you are now. Rsync can't cut it, it wasn't designed for this. And this is a good point to make, yes, it will continue to grow and I know that the current manager(s) of nic.funet.fi have commented on the burden it presents to the system which is also home to a number of other mirrors. You cannot assume that the generosity and the resources of the mirror ops are limitless and finding out where that limit lies will come too late to make amends. Pruning back the archive is a good compromise until and unless another solution can be done that will not bother the mirror ops terribly much in terms of real work. e.
Re: Trimming the CPAN - Automatic Purging
On Sat, Mar 27, 2010 at 08:52:22PM -0800, Arthur Corliss wrote: On Sat, 27 Mar 2010, Elaine Ashton wrote: Actually, I thought I was merely offering my opinion both as the sysadmin for the canonical CPAN mothership and as an end-user. If that makes me a prick, well, I suppose I should go out and buy one :) :-) You'll have to pardon my indiscriminate epithets. The barbs are coming from multiple directions. My point still stands, however. Your experience, however worthy, has zero bearing on whether or not my experience is just as worthy. Even moreso when you guys have zero clue who you're talking Are you running a large public mirror site, where you don't even have knowledge of who is mirroring from you? (Not even knowledge, let alone channels of communication with, let alone control over) Because (as I see it, not having done any of this) the logistics of that is going to have as much bearing on trying to change protocols as the actual technical merits of the protocol itself. Most of the cost of rsync is an externality to the clients. If one has an existing mirror, one is using rsync to keep it up to date, what's the incentive to change? Sounds like you may be hamstrung by your own bureacracy, but that's rarely the case in most the places I've worked. Not to mention that between passive mode FTP or even using an HTTP proxy (most of which support FTP requests) what I'm proposing is relatively painless, simple, and easy to secure. This concern I suspect is a non-issue for most mirror operators. Even if it was, allow them to pull it via HTTP for all I care. Either one is significantly more efficient than rsync. I'm missing something here, I suspect. How can HTTP be more efficient than rsync? The only obvious method to me of mirroring a CPAN site by HTTP is to instruct a client (such as wget) to get it all. In which case, in the course of doing this the client is going to recurse over the entire directory tree of the server, which, I thought, was functionally equivalent to the behaviour of the rsync server. Nicholas Clark
Re: Trimming the CPAN - Automatic Purging
On 2010-03-28, at 9:13 AM, Elaine Ashton wrote: On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote: What you're overlooking is that CPAN has, and will, continue to grow. Even if you remove the cruft now at some point it might grow to the same size just with fresh files. When that happens, you're right back where you are now. Rsync can't cut it, it wasn't designed for this. And this is a good point to make, yes, it will continue to grow and I know that the current manager(s) of nic.funet.fi have commented on the burden it presents to the system which is also home to a number of other mirrors. You cannot assume that the generosity and the resources of the mirror ops are limitless and finding out where that limit lies will come too late to make amends. Pruning back the archive is a good compromise until and unless another solution can be done that will not bother the mirror ops terribly much in terms of real work. e. Has some sort of disk quota system for CPAN author accounts ever been considered? -- best regards, Randy
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, Ask Bj?rn Hansen wrote: You are misunderstanding the problem of changing the mirroring mechanism. I am not misunderstanding, I'm just willing to accept the reality for what it is. Rsync does not scale. Period. Making new software is nice and good -- Andreas already has something that's better for the PAUSE data. G That makes my point all the more compelling, then. Some of the work has already been done. Getting 1000s of mirrors to use your software (rather than rsync which they use for ALL OTHER mirrors -- not so easy. Perhaps, but it's also possible that it might not be as bad as you think, either. You have a strong case to be made that the entire ecosystem benefits from making this change (particularly in a tiered mirroring environment), and I'd be surprised if the majority of the mirror operators aren't sympathetic and cooperative. As a sys-admin I watch my SAR reports like a hawk, I'm sure they're no different. And that's not to say you have to eliminate rsync. If you can get half of them to stop, you'll still have some significant long term gains. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Sun, 28 Mar 2010, Elaine Ashton wrote: I'm not sending any barbs, only my reasonable opinion borne from years on the reality-based operations side of this equation. As for who you are, it doesn't matter as I work daily with those who wrote, and continue to write, large chunks of operating systems, X, etc., and though their legend may precede them when it comes to my having to implement what works fabulously in their imagination, I do my best to bring them back to the grim reality that is operations. It's a frequent problem of engineers and those of us stuck having to live with and fix their grand ideas. Lofty goals usually die somewhere between dreams and production. Ah, let the chest thumping begin. My point is that regardless of where the idea comes from if it comes from a solid rationale it should be given consideration. And to date I have yet to see any one of you refute my technical understanding of the problem, only my political understanding of the problem. I/O is the issue, and it is driven predominantly by rsync. Well, you'll have to forgive those who mock your n?ivete as if it were so basic and trivial to replace rsync, it would have been done several times over by now as it's limitations are well known to all who use it on any large scale. However, it is a well-known, well-used, multi-platform and time-tested tool that will not be unseated very easily without good reason and a reason that reads something along the lines of improving performance on an archive that should have been trimmed back a bit is not a compelling reason for adoption. Naivete? Again: show me where my assertions about the primary root of your problem is incorrect? Show me how pruning CPAN isn't a temporary band-aid that fails to address a fundamental weakness in the syncing process? you haven't. You can try to dress it up any way you like in effort to discredit me, but until you do based on the facts, you have nothing. Rsync is a good tool, but for different use case scenarios. And this is a good point to make, yes, it will continue to grow and I know that the current manager(s) of nic.funet.fi have commented on the burden it presents to the system which is also home to a number of other mirrors. You cannot assume that the generosity and the resources of the mirror ops are limitless and finding out where that limit lies will come too late to make amends. G And you make my point for me. I'm sure he would love to find a more efficient use of his I/O. I assume nothing, I only allow that you'll find more interest than you assume in managing I/O. Nor does what I'm proposing preclude the intractable from continuing to use rsync. Given that rsync is your driver of the I/O problem taking away any significant percentage of the problem with have the largest dividends. Pruning back the archive is a good compromise until and unless another solution can be done that will not bother the mirror ops terribly much in terms of real work. At least you admit you're only treating the symptoms now, not the disease itself. Sure, it will buy you some time, but there'll also be some political problems to work through which will likely burn as much if not more manhours than just treating the disease. And in the end time runs out and the problem remains. Look, I don't care if you guys decide against it, but let's be honest about the compromises you're making. Hell, pruning isn't even a compromise, it's not a solution, it's only a delaying tactic. --Arthur Corliss Live Free or Die
Re: Trimming the CPAN - Automatic Purging
On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote: Has some sort of disk quota system for CPAN author accounts ever been considered? Not specifically, no, at least not that I'm aware of. That would have to be implemented on PAUSE and quotas frequently end up not solving the real problem and create a headache both for the sysadmin and the users. Jarkko and I were talking about it this morning - as he's not in favour of pruning - while trying to think of a way around the size problem and he reminded me of the idea that, if I recall correctly was Adreas' suggestion a while back, there be an A, B and C 'PAN' of sorts where you could pull varying degrees of content - sort of CPAN:Mini writ large. I don't think that idea ever got any traction because it wouldn't really solve some of the issues for the major upstream mirrors and the mechanics of deciding where to draw the lines between them. I still think it's a good idea though. I do very much like Tim's proposal for giving old modules a push to BackPAN since, with proper communication of the changes to the authors along with a way to mark exceptions, this would rid CPAN of a lot of cruft that should be on BackPan anyway. e.
Re: Trimming the CPAN - Automatic Purging
On 28 Mar 2010, at 19:32, Elaine Ashton wrote: Jarkko and I were talking about it this morning - as he's not in favour of pruning - while trying to think of a way around the size problem and he reminded me of the idea that, if I recall correctly was Adreas' suggestion a while back, there be an A, B and C 'PAN' of sorts where you could pull varying degrees of content - sort of CPAN:Mini writ large. I don't think that idea ever got any traction because it wouldn't really solve some of the issues for the major upstream mirrors and the mechanics of deciding where to draw the lines between them. I still think it's a good idea though. We're nearly there if A == a CPAN::Mini style mirror, B == the current mirror pruned and C == backpan. So the actions to make that happen are: * give the current clients specific support for this * generate a master mini mirror that other mini mirrors can pull from. * prune If we agree that this is a good solution I'm happy to do some work on it - I could host the mini master and I'd be happy to send Andreas a patch for CPAN.pm to support this scheme. -- Andy Armstrong, Hexten