Jim, The compression ratio is going to be very data dependent. If you can compress to 1/4 of the original size, that's pretty good. At LinkedIn, we compress messages up to 200 using gzip. The compressed data is about 1/3 of the original data.
Thanks, Jun On Thu, Aug 2, 2012 at 5:39 PM, James A. Robinson <jim.robin...@stanford.edu > wrote: > Hi folks, > > We've got a system where we're pushing small XML documents, produced > as part of an event stream, through kafka to another service. Each of > these messages tends to be only around 600 to 900 bytes in length. > > I was wondering if any of you had statistics on the average > compression ratio for a given message format you use, when the > publisher is configured to compress kafka messages using gzip? > > I'm expecting that the compression ratio won't be very high if Kafka > is compressing each individual message (versus compressing entire > message sets). In our test we were seeing a compression ratio of > perhaps 25%, and I think that's about what I'd expect for per-message > compression. > > Jim > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > James A. Robinson jim.robin...@stanford.edu > Stanford University HighWire Press http://highwire.stanford.edu/ > +1 650 7237294 (Work) +1 650 7259335 (Fax) >