On Thu, Dec 8, 2011 at 3:47 PM, Neha Narkhede <neha.narkh...@gmail.com> wrote: > Evan, > > Please look at autocommit.enable at > http://incubator.apache.org/kafka/configuration.html > If it is false, you can control the offset storage via the commitOffsets > API call.
Does this mean that if autocommit.enable is set to true then calling commitOffsets() does nothing? My goal is to signal the consumer and ask it to stop consuming / processing messages, call commitOffsets(), then shut down the consumer. Would this work or should I have to worry about what has been pulled from the broker (in batch and maybe sitting in a buffer) but the consumer has not consumed yet as well? Thanks, hmb. >>> So, commit the offset when you have an ack, however that is defined; > Rollback to an earlier offset when you don't get acks, > and de-dup as necessary. > > Sounds like you can use commitOffsets() right after getting an ack. > > Thanks, > Neha > > On Thu, Dec 8, 2011 at 12:44 PM, Evan Chan <e...@ooyala.com> wrote: > >> What you mean is that we need to modify (have our own modified copy of) the >> high level consumer (specifically the ConsumerConnector) so that instead of >> automatically calling commitOffset(), we can call commitOffset() at our >> own discretion, when we know that the messages have gotten to their >> destination. >> >> I am planning to do this BTW for a similar use case. >> Exactly once == at least once + de-duplication. >> So, commit the offset when you have an ack, however that is defined; >> Rollback to an earlier offset when you don't get acks, >> and de-dup as necessary. >> >> -Evan >> >> >> On Thu, Dec 8, 2011 at 10:03 AM, Jun Rao <jun...@gmail.com> wrote: >> >> > Neha is right. It's possible to achieve exactly-once delivery even in >> high >> > level consumer. What you have to do is do make sure all consumed messages >> > are really consumed and then call commitOffset. When you call >> commitOffset, >> > all messages returned to the apps should have been fully consumed or put >> in >> > a safe place. >> > >> > Thanks, >> > >> > Jun >> > >> > On Thu, Dec 8, 2011 at 9:52 AM, Neha Narkhede <neha.narkh...@gmail.com >> > >wrote: >> > >> > > Mark, >> > > >> > > >> Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? >> > What >> > > are the differences? >> > > >> > > The high level consumer check points the offsets in zookeeper, either >> > > periodically or based on an API call (look at commitOffsets()). >> > > >> > > If you want to checkpoint each and every message offset, exactly-once >> > > semantics will be expensive. But if you are willing to tolerate a small >> > > window of duplicates, you could buffer and write the offsets in >> batches. >> > > If you choose to do the former, commitOffsets() approach is expensive, >> > > since that can lead to too many writes on zookeeper. If you choose the >> > > later, it could be fine, and you can use the high level consumer >> itself. >> > > >> > > On the contrary, if your consumer is writing the messages to some >> > database >> > > or persistent storage, you might be better off using SimpleConsumer. >> > There >> > > was another discussion about making the offset storage of the high >> level >> > > consumer pluggable, but we don't have that feature yet. >> > > >> > > Thanks, >> > > Neha >> > > >> > > >> > > On Thu, Dec 8, 2011 at 9:32 AM, Jun Rao <jun...@gmail.com> wrote: >> > > >> > > > Currently, the high level consumer (with ZK integration) doesn't >> expose >> > > > offsets to the consumer. Only SimpleConsumer does. >> > > > >> > > > Jun >> > > > >> > > > On Thu, Dec 8, 2011 at 9:15 AM, Mark <static.void....@gmail.com> >> > wrote: >> > > > >> > > > > "This is only possible through SimpleConsumer right now." >> > > > > >> > > > > >> > > > > Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? >> > What >> > > > > are the differences? >> > > > > >> > > > > >> > > > > On 12/8/11 8:53 AM, Jun Rao wrote: >> > > > > >> > > > >> Mark, >> > > > >> >> > > > >> Today, this is mostly the responsibility of the consumer, by >> > managing >> > > > the >> > > > >> offsets properly. For example, if the consumer periodically >> flushes >> > > > >> messages to disk, it has to checkpoint to disk the offset >> > > corresponding >> > > > to >> > > > >> the last flush. On failure, the consumer has to rewind the >> > consumption >> > > > >> from >> > > > >> the last checkpointed offset. This is only possible through >> > > > SimpleConsumer >> > > > >> right now. >> > > > >> >> > > > >> Thanks, >> > > > >> >> > > > >> Jun >> > > > >> >> > > > >> On Thu, Dec 8, 2011 at 8:18 AM, Mark<static.void....@gmail.com**> >> > > > wrote: >> > > > >> >> > > > >> How can one guarantee exactly one semantics when using Kafka as a >> > > > >>> traditional queue? Is this guarantee the responsibility of the >> > > > consumer? >> > > > >>> >> > > > >>> >> > > > >> > > >> > >> >> >> >> -- >> -- >> *Evan Chan* >> Senior Software Engineer | >> e...@ooyala.com | (650) 996-4600 >> www.ooyala.com | blog <http://www.ooyala.com/blog> | >> @ooyala<http://www.twitter.com/ooyala> >>