Did you see the top comments in bmz.c: /** * An effective/efficient block compressor for input containing long common * strings (e.g. web pages from a website) * * cf. Bentley & McIlroy, "Data Compression Using Long Common Strings", 1999 * cf. BMDiff & Zippy mentioned in the Bigtable paper */
The B&M paper is available online if you search for it. BMZ by default is essentially the BM algorithm plus LZO. But the library is flexible enough allow other combinations. On Mar 14, 4:02 pm, Mateusz Berezecki <[email protected]> wrote: > I've been trying to figure out what kind of compression algorithm is > BMZ and failed. So could someone please give me some references or > pointers to literature (can be online) to the BMZ algorithm > explanation, etc? > > The second thought I had was to ask if LZMA was considered for > compression? What was the original criterion for selecting supported > compression algorithms? The main criteria is the throughput for encode/decode typical commit log and cellstore blocks (default compressed block size is 64KB, about 100-200KB raw size). LZMA (much slower than bzip2, which is much slower than gzip, which is much slower than bmz and lzo) and bzip2 are considered too slow and their data compression advantage is not that big for relatively small blocks as both LZMA and bzip2 take advantage of large (many MBs) buffers. Of course, you're welcome to experiment with other compression options (I hope our BlockCompressionCodec API is easy enough for you to extend :) My BM implementation is experimental (but seems stable enough from random tests) in nature and hardly tuned (except for avoiding using modulo in Rabin-Karp hash table lookups), I think profiling and tuning would make it a lot faster (it's already about 4-5x faster than gzip on various input.) __Luke --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
