Re: Using a better compression than .gz for one's CPAN modules

2010-11-22 Thread Andreas J. Koenig
 On Sat, 20 Nov 2010 23:22:52 +0100, Aristotle Pagaltzis 
 pagalt...@gmx.de said:

   It’s gonna be a lot of work to iron out the entire tool chain to
   support the newer formats; then it will take a lot of time until
   the work trickles out far enough that people could start relying
   on it.

In the case of bzip2 I couldn't resist after having watched bzip2's
acceptance for several years. So I prodded all toolchain authors to
support bz2. It is now done and seems to work fine.

   For quite piddly gains, in absolute numbers.

   I really don’t see the point. Gzip is Good Enough.

Agreed, but since bzip2 support is already done we can welcome it when
people actually use it.

-- 
andreas


Re: Using a better compression than .gz for one's CPAN modules

2010-11-22 Thread Aristotle Pagaltzis
* Andreas J. Koenig andreas.koenig.7os6v...@franz.ak.mind.de [2010-11-22 
09:20]:
 Agreed, but since bzip2 support is already done we can welcome
 it when people actually use it.

I am unwilling to encourage it but I won’t argue if someone wants
to use it. And it can be a win for distributions with very large
bundled data files so one might as well use it for them since the
support exists. I just don’t want to see a campaign against gzip.

Regards,
-- 
Aristotle Pagaltzis // http://plasmasturm.org/


Reducing rsync cost (was: Re: Using a better compression than .gz for one's CPAN modules)

2010-11-22 Thread David Landgren

On 19/11/2010 20:57, dhu...@hudes.org wrote:

source code, even 100KLOC? Once you go to .gz you're already at better
than 2:1. What are you going to save by going to even 3:1, 10Kbytes?
compared to the nuisance inflicted, it's nothing.


Over the entire CPAN archive, it'd be significant...

I agree on the individual case it's probably not worth worrying about too
much.  But if it's easy to use .bz2 or something better it wouldn't hurt
to get that word out.  (And it may be worth making it easy, though I'm not
sure about that.)

Daniel T. Staal


Disk space is cheap. Bandwidth is cheap. What's rough is the rsync between
mirrors. Compressing to .bz2 won't help that: the stress is doing a stat
on every single file in CPAN not the transfer. Work toward optimizing the
mirror distribution instead of worrying about bz2 vs gz.  Remember not


Yeah, this is the killer. In an ideal world, we would kill the symlinks 
such as authors/id/*, modules/by-category/*, modules/by-module/* and so 
on. These could be recreated via shell scripts locally on mirrors for 
people who wish to maintain these legacies. Cutting that out would 
diminish the rsync burden considerably.


David

--
There's bum trash in my hall and my place is ripped
I've totaled another amp, I'm calling in sick


Re: Reducing rsync cost (was: Re: Using a better compression than .gz for one's CPAN modules)

2010-11-22 Thread David Nicol
On Mon, Nov 22, 2010 at 4:37 AM, David Landgren da...@landgren.net wrote:
 Yeah, this is the killer. In an ideal world, we would kill the symlinks such
 as authors/id/*, modules/by-category/*, modules/by-module/* and so on. These
 could be recreated via shell scripts locally on mirrors for people who wish
 to maintain these legacies. Cutting that out would diminish the rsync burden
 considerably.

 David

or re-engineer CPAN as a sqlite+FTSE database, and re-engineer the
mirroring process as a database mirror via a TBD compact database diff
protocol (I have no intention of doing any of this myself; good
morning)

-- 
It is merely a matter of persistence. -- Albert Camus