Did you remember to run your tests repeatedly in different orders to
minimize the effects that cacheing might have on your results?

Don

On Fri, Jul 8, 2011 at 4:19 PM, Huan Truong <hnt7...@truman.edu> wrote:

> I've heard a complain from one guy in another mailing list about gzip
> recently. He was trying to backup tens-of-GB data every day and
> tar-gzipping (tar czvf) is so unacceptably slow.
>
> I once faced the same problem when I needed to create hard drive
> snapshot for computers and obviously I wanted to save bandwidth so that
> I wouldn't have to transfer a lot of data over a 100Mbps line.
>
> Let's suppose we can save 5GB on a 15GB file by compressing that file.
> To transfer 15GB we need 15,000 MB / (100/8) MB/sec = 1,200 secs =  20
> mins on a perfect network. Usually on Truman network (cross-buildings)
> it takes 3 times as much. So realistically we need 60 minutes to
> transfer a 15GB snapshot image. By compressing, the resulting 10GB  file
> would take only 40 mins to transfer. Good deal? No.
>
> It *didn't help*. It takes more than 1 hour to compress that file, so
> the uploading process takes even longer. The clients (pentium4 2.8 HT)
> somehow struggles to decompress the file too, so the result comes out
> even. So why the hassle? My conclusion: It's better *not* to compress
> the image with gzip at all. It's even clearer to see when you have a
> fast connection, the IO gain goes to CPU computation, the result comes
> out worse.
>
> Turns out gzip, also, bzip2 and zip are terrible in CPU usage, as it
> takes a lot of time to compress and decompress. There are other
> algorithms that compress a little bit worse than gzip but is much easier
> on the CPU (most of them are based on the Lempel-Ziv algorithm): LZO,
> Google's Snappy, LZF, and LZ4. LZ4 is crazily fast.
>
> I did some quick bench-marking with the linux source:
>
> 1634!ht:~/src/lz4-read-only$ time ./tar-none.sh ../linux-3.0-rc6 linux-s
> real    0m4.390s
> user    0m0.620s
> sys     0m0.870s
>
> 1635!ht:~/src/lz4-read-only$ time ./tar-gzip.sh ../linux-3.0-rc6 linux-s
> real    0m43.683s
> user    0m40.901s
> sys     0m0.319s
>
> 1636!ht:~/src/lz4-read-only$ time ./tar-lz4.sh ../linux-3.0-rc6 linux-s
> real    0m5.568s
> user    0m4.831s
> sys     0m0.272s
>
> Clear win for lz4! (I used pipe, so theoretically it can be even
> better).
>
> I have patched lz4 utility so that it would happily accept std for stdin
> for infile, and also std for stdout for outfile, so you can pipe from
> whatever program you like.
>
> git clone g...@github.com:htruong/lz4.git for the utility.
>
>
> Cheers, nice weekend,
> - Huan.
> --
> Huan Truong
> 600-988-9066
> http://tnhh.net/
>
>

Reply via email to