Not all compressions are created equal on all formats:
http://sourceforge.net/projects/boost/files/boost/1.46.1/ ... 7zip
wins here.

On Fri, Jul 8, 2011 at 17:28, Alex Horn <alex.h...@gmail.com> wrote:
> Well done on taking the initiative. When you do this you could also
> help those who are interested in reviewing the code by basing your
> work on the upstream project before committing your changes. This way
> it is clearer what has changed and a link to the diff can be shared.
>
> Cheers,
> Alex
>
> On 8 July 2011 22:36, Don Bindner <don.bind...@gmail.com> wrote:
>> Oh, and you shouldn't use 'std' for stdin and out.  You should use '-'.
>>  That's what many programs do (including gzip for example); hyphen will be
>> more familiar to experinced users since it's already an established
>> interface rule.
>> Don
>>
>> On Fri, Jul 8, 2011 at 4:31 PM, Don Bindner <don.bind...@gmail.com> wrote:
>>>
>>> Did you remember to run your tests repeatedly in different orders to
>>> minimize the effects that cacheing might have on your results?
>>> Don
>>>
>>> On Fri, Jul 8, 2011 at 4:19 PM, Huan Truong <hnt7...@truman.edu> wrote:
>>>>
>>>> I've heard a complain from one guy in another mailing list about gzip
>>>> recently. He was trying to backup tens-of-GB data every day and
>>>> tar-gzipping (tar czvf) is so unacceptably slow.
>>>>
>>>> I once faced the same problem when I needed to create hard drive
>>>> snapshot for computers and obviously I wanted to save bandwidth so that
>>>> I wouldn't have to transfer a lot of data over a 100Mbps line.
>>>>
>>>> Let's suppose we can save 5GB on a 15GB file by compressing that file.
>>>> To transfer 15GB we need 15,000 MB / (100/8) MB/sec = 1,200 secs =  20
>>>> mins on a perfect network. Usually on Truman network (cross-buildings)
>>>> it takes 3 times as much. So realistically we need 60 minutes to
>>>> transfer a 15GB snapshot image. By compressing, the resulting 10GB  file
>>>> would take only 40 mins to transfer. Good deal? No.
>>>>
>>>> It *didn't help*. It takes more than 1 hour to compress that file, so
>>>> the uploading process takes even longer. The clients (pentium4 2.8 HT)
>>>> somehow struggles to decompress the file too, so the result comes out
>>>> even. So why the hassle? My conclusion: It's better *not* to compress
>>>> the image with gzip at all. It's even clearer to see when you have a
>>>> fast connection, the IO gain goes to CPU computation, the result comes
>>>> out worse.
>>>>
>>>> Turns out gzip, also, bzip2 and zip are terrible in CPU usage, as it
>>>> takes a lot of time to compress and decompress. There are other
>>>> algorithms that compress a little bit worse than gzip but is much easier
>>>> on the CPU (most of them are based on the Lempel-Ziv algorithm): LZO,
>>>> Google's Snappy, LZF, and LZ4. LZ4 is crazily fast.
>>>>
>>>> I did some quick bench-marking with the linux source:
>>>>
>>>> 1634!ht:~/src/lz4-read-only$ time ./tar-none.sh ../linux-3.0-rc6 linux-s
>>>> real    0m4.390s
>>>> user    0m0.620s
>>>> sys     0m0.870s
>>>>
>>>> 1635!ht:~/src/lz4-read-only$ time ./tar-gzip.sh ../linux-3.0-rc6 linux-s
>>>> real    0m43.683s
>>>> user    0m40.901s
>>>> sys     0m0.319s
>>>>
>>>> 1636!ht:~/src/lz4-read-only$ time ./tar-lz4.sh ../linux-3.0-rc6 linux-s
>>>> real    0m5.568s
>>>> user    0m4.831s
>>>> sys     0m0.272s
>>>>
>>>> Clear win for lz4! (I used pipe, so theoretically it can be even
>>>> better).
>>>>
>>>> I have patched lz4 utility so that it would happily accept std for stdin
>>>> for infile, and also std for stdout for outfile, so you can pipe from
>>>> whatever program you like.
>>>>
>>>> git clone g...@github.com:htruong/lz4.git for the utility.
>>>>
>>>>
>>>> Cheers, nice weekend,
>>>> - Huan.
>>>> --
>>>> Huan Truong
>>>> 600-988-9066
>>>> http://tnhh.net/
>>>>
>>>
>>
>>
>

Reply via email to