Did you remember to run your tests repeatedly in different orders to minimize the effects that cacheing might have on your results?
Don On Fri, Jul 8, 2011 at 4:19 PM, Huan Truong <hnt7...@truman.edu> wrote: > I've heard a complain from one guy in another mailing list about gzip > recently. He was trying to backup tens-of-GB data every day and > tar-gzipping (tar czvf) is so unacceptably slow. > > I once faced the same problem when I needed to create hard drive > snapshot for computers and obviously I wanted to save bandwidth so that > I wouldn't have to transfer a lot of data over a 100Mbps line. > > Let's suppose we can save 5GB on a 15GB file by compressing that file. > To transfer 15GB we need 15,000 MB / (100/8) MB/sec = 1,200 secs = 20 > mins on a perfect network. Usually on Truman network (cross-buildings) > it takes 3 times as much. So realistically we need 60 minutes to > transfer a 15GB snapshot image. By compressing, the resulting 10GB file > would take only 40 mins to transfer. Good deal? No. > > It *didn't help*. It takes more than 1 hour to compress that file, so > the uploading process takes even longer. The clients (pentium4 2.8 HT) > somehow struggles to decompress the file too, so the result comes out > even. So why the hassle? My conclusion: It's better *not* to compress > the image with gzip at all. It's even clearer to see when you have a > fast connection, the IO gain goes to CPU computation, the result comes > out worse. > > Turns out gzip, also, bzip2 and zip are terrible in CPU usage, as it > takes a lot of time to compress and decompress. There are other > algorithms that compress a little bit worse than gzip but is much easier > on the CPU (most of them are based on the Lempel-Ziv algorithm): LZO, > Google's Snappy, LZF, and LZ4. LZ4 is crazily fast. > > I did some quick bench-marking with the linux source: > > 1634!ht:~/src/lz4-read-only$ time ./tar-none.sh ../linux-3.0-rc6 linux-s > real 0m4.390s > user 0m0.620s > sys 0m0.870s > > 1635!ht:~/src/lz4-read-only$ time ./tar-gzip.sh ../linux-3.0-rc6 linux-s > real 0m43.683s > user 0m40.901s > sys 0m0.319s > > 1636!ht:~/src/lz4-read-only$ time ./tar-lz4.sh ../linux-3.0-rc6 linux-s > real 0m5.568s > user 0m4.831s > sys 0m0.272s > > Clear win for lz4! (I used pipe, so theoretically it can be even > better). > > I have patched lz4 utility so that it would happily accept std for stdin > for infile, and also std for stdout for outfile, so you can pipe from > whatever program you like. > > git clone g...@github.com:htruong/lz4.git for the utility. > > > Cheers, nice weekend, > - Huan. > -- > Huan Truong > 600-988-9066 > http://tnhh.net/ > >