On 08/19/2012 09:09 AM, Vincent Diepeveen wrote: > Here is the results: > > The original file used for every compressor. A small EGTB of 1.8GB: > > -rw-rw-r--. 1 diep diep 1814155128 Aug 19 10:37 knnknp_w.dtb > > LZO (default compression): > > -rw-rw-r--. 1 diep diep 474233006 Aug 19 10:37 knnknp_w.dtb.lzo > > 7-zip (default compression): > > -rw-rw-r--. 1 diep diep 160603822 Aug 18 19:33 ../7z/33p/knnknp_w.dtb.7z > > Andrew Kadatch: > > -rw-rw-r--. 1 diep diep 334258087 Aug 19 14:37 knnknp_w.dtb.emd > > We see kadatch is a 140MB smaller in size than LZO, that's a lot at > 474MB total size for the lzo > and it's 10% of total size of the original data. > > So LZO in fact is so bad it doesn't even beat another Huffman > compressor. A fast bucket compressor not using a dictionary at all is > hammering it.
Thanks for these insightful findings Vincent. Unless I missed something, I didn't see timings for these algorithms. I would be very interested to see these compressions wrapped in a 'time' command and please make sure to flush your buffer cache in between. In Hadoop LZO seems to be the defacto standard for its widespread use, speed both of compression and decompression, and relatively high compression ratio compared to very bare-bones compressors. So seeing these results, alongside the 1) time to compress when data is solely on HDD and 2) time to decompress when data is solely on HDD would be really, really helpful. For Hadoop, since compression is mainly used to "package" data up prior to network transfer (and obviously it gets "unpackaged" on the other side if it needs to be used), the balance between speed and compression is a fine balance, dependent on your network and CPU capabilities. Please let me know if you get around to running these experiments and if you find another compressor out there that is excellent and I'll have to consider it for my use in Hadoop! Best, ellis _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf