Ross, With compression enabled, it's a bit hard to implement exact-once since offsets are only advanced after a compressed batch of messages has been consumed. So, you will have to make sure that each batch of messages can be consumed together as a unit. The other option is to compress with a batch size of 1.
Thanks, Jun On Thu, May 31, 2012 at 8:05 PM, Ross Black <ross.w.bl...@gmail.com> wrote: > Hi, > > Using SimpleConsumer, I get the offset of a message (from MessageAndOffset) > and persist it with my consumer data to get exactly-once semantics for > consumer state (as described in the kafka design docs). If the consumer > fails then it is simply a matter of starting replay of messages from the > persisted index. > > When using compression, the offset from MessageAndOffset appears to be the > offset of the compressed batch. e.g. For a batch of 10 messages, the > offset returned for messages 1-9 is the start of the *current* batch, and > the offset for message 10 is the start of the *next* batch. > > How can I get the exactly-once semantics for consumer state? > Is there a way that I can get a batch of messages from SimpleConsumer? > (otherwise I have to reconstruct a batch by watching for a change in the > offset between messages) > > Thanks, > Ross >