Casey, You are right about the high-level (Zookeeper) consumer dealing with broker failures - in which case a "rebalance" is triggered to evenly re-allocate the consumption of partitions across the consumers in the consumer group. Furthermore, if you set autocommit.enabled to true, then it will commit the consumed offsets for each partition into zookeeper at a configurable period. That should address your second concern. If you use the SimpleConsumer on the other hand, you will need to manually manage consumed offsets and deal with broker outages.
Thanks, Joel On Thu, Nov 17, 2011 at 10:11 AM, Sybrandy, Casey <caseysybra...@noviidesign.com> wrote: > Hello, > > I have a couple questions about consumers. > > 1) What's the preferred method of writing a consumer? The SimpleConsumer or > the Zookeeper consumer? I'm guessing the Zookeeper one allows the consumer > to handle failures within the Kafka cluster. E.g. If one node goes down, the > consumer will then pull from the replicated node. > > 2) What's the proper way to track which messages have been processed by a > consumer? The scenario I'm looking at is if a consumer dies and we later > restart it. What we don't want happening is the consumer re-processing > records that have already been processed. > > What I'm basically looking for are best practices for setting up a system > where we have to handle a high-volume of traffic. > > Thanks. > > Casey