On Mar 26, 2010, at 8:23 PM, Arthur Corliss wrote: > > Sure, I don't run a CPAN mirror, but I do manage many, many terrabytes of > storage as part of my day job. I think it's a tad presumptuous to disregard > input just because we're not in your inner sanctum. As I mentioned in a > follow up e-mail: this is simply a matter of selecting the correct problem > domain. I believe that streamlining the mirroring process will provide > greater gains for less effort. > > That's not to say that pursuing other efficiencies isn't worthwhile, just > that you need to prioritize. > > But what the hell do I know. I don't run a *CPAN* mirror, so I must be > freaking clueless...
Oh, don't be such a drama queen. I rebuilt and helped run nic.funet.fi for 2 years which is the canonical mirror for a large number of mirrors and the perspective of having a few terabytes spinning in storage changes quite dramatically when you are actually serving a few terabytes to thousands of clients. CPAN grew to be quite a burden on the site not only because of the high demand, but also because of the multitude of small files and I'm sure other mirrors feel similarly burdened. The sort of pruning Tim brought up has long been an idea, but with the current and growing size of the archive, something does need to be done to alleviate the burden not only on the canonical mirrors, but also on the random folks who want to grab a local mirror for themselves. In my present work environment, 12gb isn't a lot of disk space, but it's a lot considering I don't need to install perl modules daily and the vast majority of it I'll likely never use. It would be a kindness to both the mirror operators and to the end-users to trim it down to a manageable size. As for efficiency, rsync remains a good tool for the job that works on nearly every platform which is a rather tall order to match with any other solution. Relegating the cruft to BackPAN to make the current CPAN slimmer and less demanding on all fronts is an idea that would be welcomed by more than just mirror ops. The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version of the required module (I bumped into this recently but can't remember exactly which module it was) but I think it's rare and the practise should be discouraged. e.