Evan, Please look at autocommit.enable at http://incubator.apache.org/kafka/configuration.html If it is false, you can control the offset storage via the commitOffsets API call.
>> So, commit the offset when you have an ack, however that is defined; Rollback to an earlier offset when you don't get acks, and de-dup as necessary. Sounds like you can use commitOffsets() right after getting an ack. Thanks, Neha On Thu, Dec 8, 2011 at 12:44 PM, Evan Chan <e...@ooyala.com> wrote: > What you mean is that we need to modify (have our own modified copy of) the > high level consumer (specifically the ConsumerConnector) so that instead of > automatically calling commitOffset(), we can call commitOffset() at our > own discretion, when we know that the messages have gotten to their > destination. > > I am planning to do this BTW for a similar use case. > Exactly once == at least once + de-duplication. > So, commit the offset when you have an ack, however that is defined; > Rollback to an earlier offset when you don't get acks, > and de-dup as necessary. > > -Evan > > > On Thu, Dec 8, 2011 at 10:03 AM, Jun Rao <jun...@gmail.com> wrote: > > > Neha is right. It's possible to achieve exactly-once delivery even in > high > > level consumer. What you have to do is do make sure all consumed messages > > are really consumed and then call commitOffset. When you call > commitOffset, > > all messages returned to the apps should have been fully consumed or put > in > > a safe place. > > > > Thanks, > > > > Jun > > > > On Thu, Dec 8, 2011 at 9:52 AM, Neha Narkhede <neha.narkh...@gmail.com > > >wrote: > > > > > Mark, > > > > > > >> Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? > > What > > > are the differences? > > > > > > The high level consumer check points the offsets in zookeeper, either > > > periodically or based on an API call (look at commitOffsets()). > > > > > > If you want to checkpoint each and every message offset, exactly-once > > > semantics will be expensive. But if you are willing to tolerate a small > > > window of duplicates, you could buffer and write the offsets in > batches. > > > If you choose to do the former, commitOffsets() approach is expensive, > > > since that can lead to too many writes on zookeeper. If you choose the > > > later, it could be fine, and you can use the high level consumer > itself. > > > > > > On the contrary, if your consumer is writing the messages to some > > database > > > or persistent storage, you might be better off using SimpleConsumer. > > There > > > was another discussion about making the offset storage of the high > level > > > consumer pluggable, but we don't have that feature yet. > > > > > > Thanks, > > > Neha > > > > > > > > > On Thu, Dec 8, 2011 at 9:32 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > Currently, the high level consumer (with ZK integration) doesn't > expose > > > > offsets to the consumer. Only SimpleConsumer does. > > > > > > > > Jun > > > > > > > > On Thu, Dec 8, 2011 at 9:15 AM, Mark <static.void....@gmail.com> > > wrote: > > > > > > > > > "This is only possible through SimpleConsumer right now." > > > > > > > > > > > > > > > Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? > > What > > > > > are the differences? > > > > > > > > > > > > > > > On 12/8/11 8:53 AM, Jun Rao wrote: > > > > > > > > > >> Mark, > > > > >> > > > > >> Today, this is mostly the responsibility of the consumer, by > > managing > > > > the > > > > >> offsets properly. For example, if the consumer periodically > flushes > > > > >> messages to disk, it has to checkpoint to disk the offset > > > corresponding > > > > to > > > > >> the last flush. On failure, the consumer has to rewind the > > consumption > > > > >> from > > > > >> the last checkpointed offset. This is only possible through > > > > SimpleConsumer > > > > >> right now. > > > > >> > > > > >> Thanks, > > > > >> > > > > >> Jun > > > > >> > > > > >> On Thu, Dec 8, 2011 at 8:18 AM, Mark<static.void....@gmail.com**> > > > > wrote: > > > > >> > > > > >> How can one guarantee exactly one semantics when using Kafka as a > > > > >>> traditional queue? Is this guarantee the responsibility of the > > > > consumer? > > > > >>> > > > > >>> > > > > > > > > > > > > > -- > -- > *Evan Chan* > Senior Software Engineer | > e...@ooyala.com | (650) 996-4600 > www.ooyala.com | blog <http://www.ooyala.com/blog> | > @ooyala<http://www.twitter.com/ooyala> >