Not all compressions are created equal on all formats: http://sourceforge.net/projects/boost/files/boost/1.46.1/ ... 7zip wins here.
On Fri, Jul 8, 2011 at 17:28, Alex Horn <alex.h...@gmail.com> wrote: > Well done on taking the initiative. When you do this you could also > help those who are interested in reviewing the code by basing your > work on the upstream project before committing your changes. This way > it is clearer what has changed and a link to the diff can be shared. > > Cheers, > Alex > > On 8 July 2011 22:36, Don Bindner <don.bind...@gmail.com> wrote: >> Oh, and you shouldn't use 'std' for stdin and out. You should use '-'. >> That's what many programs do (including gzip for example); hyphen will be >> more familiar to experinced users since it's already an established >> interface rule. >> Don >> >> On Fri, Jul 8, 2011 at 4:31 PM, Don Bindner <don.bind...@gmail.com> wrote: >>> >>> Did you remember to run your tests repeatedly in different orders to >>> minimize the effects that cacheing might have on your results? >>> Don >>> >>> On Fri, Jul 8, 2011 at 4:19 PM, Huan Truong <hnt7...@truman.edu> wrote: >>>> >>>> I've heard a complain from one guy in another mailing list about gzip >>>> recently. He was trying to backup tens-of-GB data every day and >>>> tar-gzipping (tar czvf) is so unacceptably slow. >>>> >>>> I once faced the same problem when I needed to create hard drive >>>> snapshot for computers and obviously I wanted to save bandwidth so that >>>> I wouldn't have to transfer a lot of data over a 100Mbps line. >>>> >>>> Let's suppose we can save 5GB on a 15GB file by compressing that file. >>>> To transfer 15GB we need 15,000 MB / (100/8) MB/sec = 1,200 secs = 20 >>>> mins on a perfect network. Usually on Truman network (cross-buildings) >>>> it takes 3 times as much. So realistically we need 60 minutes to >>>> transfer a 15GB snapshot image. By compressing, the resulting 10GB file >>>> would take only 40 mins to transfer. Good deal? No. >>>> >>>> It *didn't help*. It takes more than 1 hour to compress that file, so >>>> the uploading process takes even longer. The clients (pentium4 2.8 HT) >>>> somehow struggles to decompress the file too, so the result comes out >>>> even. So why the hassle? My conclusion: It's better *not* to compress >>>> the image with gzip at all. It's even clearer to see when you have a >>>> fast connection, the IO gain goes to CPU computation, the result comes >>>> out worse. >>>> >>>> Turns out gzip, also, bzip2 and zip are terrible in CPU usage, as it >>>> takes a lot of time to compress and decompress. There are other >>>> algorithms that compress a little bit worse than gzip but is much easier >>>> on the CPU (most of them are based on the Lempel-Ziv algorithm): LZO, >>>> Google's Snappy, LZF, and LZ4. LZ4 is crazily fast. >>>> >>>> I did some quick bench-marking with the linux source: >>>> >>>> 1634!ht:~/src/lz4-read-only$ time ./tar-none.sh ../linux-3.0-rc6 linux-s >>>> real 0m4.390s >>>> user 0m0.620s >>>> sys 0m0.870s >>>> >>>> 1635!ht:~/src/lz4-read-only$ time ./tar-gzip.sh ../linux-3.0-rc6 linux-s >>>> real 0m43.683s >>>> user 0m40.901s >>>> sys 0m0.319s >>>> >>>> 1636!ht:~/src/lz4-read-only$ time ./tar-lz4.sh ../linux-3.0-rc6 linux-s >>>> real 0m5.568s >>>> user 0m4.831s >>>> sys 0m0.272s >>>> >>>> Clear win for lz4! (I used pipe, so theoretically it can be even >>>> better). >>>> >>>> I have patched lz4 utility so that it would happily accept std for stdin >>>> for infile, and also std for stdout for outfile, so you can pipe from >>>> whatever program you like. >>>> >>>> git clone g...@github.com:htruong/lz4.git for the utility. >>>> >>>> >>>> Cheers, nice weekend, >>>> - Huan. >>>> -- >>>> Huan Truong >>>> 600-988-9066 >>>> http://tnhh.net/ >>>> >>> >> >> >