What you mean is that we need to modify (have our own modified copy of) the high level consumer (specifically the ConsumerConnector) so that instead of automatically calling commitOffset(), we can call commitOffset() at our own discretion, when we know that the messages have gotten to their destination.
I am planning to do this BTW for a similar use case. Exactly once == at least once + de-duplication. So, commit the offset when you have an ack, however that is defined; Rollback to an earlier offset when you don't get acks, and de-dup as necessary. -Evan On Thu, Dec 8, 2011 at 10:03 AM, Jun Rao <jun...@gmail.com> wrote: > Neha is right. It's possible to achieve exactly-once delivery even in high > level consumer. What you have to do is do make sure all consumed messages > are really consumed and then call commitOffset. When you call commitOffset, > all messages returned to the apps should have been fully consumed or put in > a safe place. > > Thanks, > > Jun > > On Thu, Dec 8, 2011 at 9:52 AM, Neha Narkhede <neha.narkh...@gmail.com > >wrote: > > > Mark, > > > > >> Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? > What > > are the differences? > > > > The high level consumer check points the offsets in zookeeper, either > > periodically or based on an API call (look at commitOffsets()). > > > > If you want to checkpoint each and every message offset, exactly-once > > semantics will be expensive. But if you are willing to tolerate a small > > window of duplicates, you could buffer and write the offsets in batches. > > If you choose to do the former, commitOffsets() approach is expensive, > > since that can lead to too many writes on zookeeper. If you choose the > > later, it could be fine, and you can use the high level consumer itself. > > > > On the contrary, if your consumer is writing the messages to some > database > > or persistent storage, you might be better off using SimpleConsumer. > There > > was another discussion about making the offset storage of the high level > > consumer pluggable, but we don't have that feature yet. > > > > Thanks, > > Neha > > > > > > On Thu, Dec 8, 2011 at 9:32 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > Currently, the high level consumer (with ZK integration) doesn't expose > > > offsets to the consumer. Only SimpleConsumer does. > > > > > > Jun > > > > > > On Thu, Dec 8, 2011 at 9:15 AM, Mark <static.void....@gmail.com> > wrote: > > > > > > > "This is only possible through SimpleConsumer right now." > > > > > > > > > > > > Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? > What > > > > are the differences? > > > > > > > > > > > > On 12/8/11 8:53 AM, Jun Rao wrote: > > > > > > > >> Mark, > > > >> > > > >> Today, this is mostly the responsibility of the consumer, by > managing > > > the > > > >> offsets properly. For example, if the consumer periodically flushes > > > >> messages to disk, it has to checkpoint to disk the offset > > corresponding > > > to > > > >> the last flush. On failure, the consumer has to rewind the > consumption > > > >> from > > > >> the last checkpointed offset. This is only possible through > > > SimpleConsumer > > > >> right now. > > > >> > > > >> Thanks, > > > >> > > > >> Jun > > > >> > > > >> On Thu, Dec 8, 2011 at 8:18 AM, Mark<static.void....@gmail.com**> > > > wrote: > > > >> > > > >> How can one guarantee exactly one semantics when using Kafka as a > > > >>> traditional queue? Is this guarantee the responsibility of the > > > consumer? > > > >>> > > > >>> > > > > > > -- -- *Evan Chan* Senior Software Engineer | e...@ooyala.com | (650) 996-4600 www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>