On Thu, Dec 8, 2011 at 9:36 PM, Neha Narkhede <neha.narkh...@gmail.com> wrote: > Hisham, > >>> Does this mean that if autocommit.enable is set to true then calling > commitOffsets() does nothing? > > No. It means that in addition to the automatic offset commit, you will ask > the consumer to commit offsets when you want.
Perfect. >>> My goal is to signal the consumer and > ask it to stop consuming / processing messages, call commitOffsets(), > > When you call commitOffsets, ONLY offsets for the messages returned by the > consumer iterator will be committed. It will > not prematurely commit data that you haven't consumed. Fantastic as well. Thanks again! hmb. > Thanks, > Neha > > On Thu, Dec 8, 2011 at 6:29 PM, Hisham Mardam-Bey <his...@mate1inc.com>wrote: > >> On Thu, Dec 8, 2011 at 3:47 PM, Neha Narkhede <neha.narkh...@gmail.com> >> wrote: >> > Evan, >> > >> > Please look at autocommit.enable at >> > http://incubator.apache.org/kafka/configuration.html >> > If it is false, you can control the offset storage via the commitOffsets >> > API call. >> >> Does this mean that if autocommit.enable is set to true then calling >> commitOffsets() does nothing? My goal is to signal the consumer and >> ask it to stop consuming / processing messages, call commitOffsets(), >> then shut down the consumer. Would this work or should I have to worry >> about what has been pulled from the broker (in batch and maybe sitting >> in a buffer) but the consumer has not consumed yet as well? >> >> Thanks, >> >> hmb. >> >> >>> So, commit the offset when you have an ack, however that is defined; >> > Rollback to an earlier offset when you don't get acks, >> > and de-dup as necessary. >> > >> > Sounds like you can use commitOffsets() right after getting an ack. >> > >> > Thanks, >> > Neha >> > >> > On Thu, Dec 8, 2011 at 12:44 PM, Evan Chan <e...@ooyala.com> wrote: >> > >> >> What you mean is that we need to modify (have our own modified copy of) >> the >> >> high level consumer (specifically the ConsumerConnector) so that >> instead of >> >> automatically calling commitOffset(), we can call commitOffset() at our >> >> own discretion, when we know that the messages have gotten to their >> >> destination. >> >> >> >> I am planning to do this BTW for a similar use case. >> >> Exactly once == at least once + de-duplication. >> >> So, commit the offset when you have an ack, however that is defined; >> >> Rollback to an earlier offset when you don't get acks, >> >> and de-dup as necessary. >> >> >> >> -Evan >> >> >> >> >> >> On Thu, Dec 8, 2011 at 10:03 AM, Jun Rao <jun...@gmail.com> wrote: >> >> >> >> > Neha is right. It's possible to achieve exactly-once delivery even in >> >> high >> >> > level consumer. What you have to do is do make sure all consumed >> messages >> >> > are really consumed and then call commitOffset. When you call >> >> commitOffset, >> >> > all messages returned to the apps should have been fully consumed or >> put >> >> in >> >> > a safe place. >> >> > >> >> > Thanks, >> >> > >> >> > Jun >> >> > >> >> > On Thu, Dec 8, 2011 at 9:52 AM, Neha Narkhede < >> neha.narkh...@gmail.com >> >> > >wrote: >> >> > >> >> > > Mark, >> >> > > >> >> > > >> Is that correct? Did you mean SimpleConsumer or >> HighLevelConsumer? >> >> > What >> >> > > are the differences? >> >> > > >> >> > > The high level consumer check points the offsets in zookeeper, >> either >> >> > > periodically or based on an API call (look at commitOffsets()). >> >> > > >> >> > > If you want to checkpoint each and every message offset, >> exactly-once >> >> > > semantics will be expensive. But if you are willing to tolerate a >> small >> >> > > window of duplicates, you could buffer and write the offsets in >> >> batches. >> >> > > If you choose to do the former, commitOffsets() approach is >> expensive, >> >> > > since that can lead to too many writes on zookeeper. If you choose >> the >> >> > > later, it could be fine, and you can use the high level consumer >> >> itself. >> >> > > >> >> > > On the contrary, if your consumer is writing the messages to some >> >> > database >> >> > > or persistent storage, you might be better off using SimpleConsumer. >> >> > There >> >> > > was another discussion about making the offset storage of the high >> >> level >> >> > > consumer pluggable, but we don't have that feature yet. >> >> > > >> >> > > Thanks, >> >> > > Neha >> >> > > >> >> > > >> >> > > On Thu, Dec 8, 2011 at 9:32 AM, Jun Rao <jun...@gmail.com> wrote: >> >> > > >> >> > > > Currently, the high level consumer (with ZK integration) doesn't >> >> expose >> >> > > > offsets to the consumer. Only SimpleConsumer does. >> >> > > > >> >> > > > Jun >> >> > > > >> >> > > > On Thu, Dec 8, 2011 at 9:15 AM, Mark <static.void....@gmail.com> >> >> > wrote: >> >> > > > >> >> > > > > "This is only possible through SimpleConsumer right now." >> >> > > > > >> >> > > > > >> >> > > > > Is that correct? Did you mean SimpleConsumer or >> HighLevelConsumer? >> >> > What >> >> > > > > are the differences? >> >> > > > > >> >> > > > > >> >> > > > > On 12/8/11 8:53 AM, Jun Rao wrote: >> >> > > > > >> >> > > > >> Mark, >> >> > > > >> >> >> > > > >> Today, this is mostly the responsibility of the consumer, by >> >> > managing >> >> > > > the >> >> > > > >> offsets properly. For example, if the consumer periodically >> >> flushes >> >> > > > >> messages to disk, it has to checkpoint to disk the offset >> >> > > corresponding >> >> > > > to >> >> > > > >> the last flush. On failure, the consumer has to rewind the >> >> > consumption >> >> > > > >> from >> >> > > > >> the last checkpointed offset. This is only possible through >> >> > > > SimpleConsumer >> >> > > > >> right now. >> >> > > > >> >> >> > > > >> Thanks, >> >> > > > >> >> >> > > > >> Jun >> >> > > > >> >> >> > > > >> On Thu, Dec 8, 2011 at 8:18 AM, Mark<static.void....@gmail.com >> **> >> >> > > > wrote: >> >> > > > >> >> >> > > > >> How can one guarantee exactly one semantics when using Kafka >> as a >> >> > > > >>> traditional queue? Is this guarantee the responsibility of the >> >> > > > consumer? >> >> > > > >>> >> >> > > > >>> >> >> > > > >> >> > > >> >> > >> >> >> >> >> >> >> >> -- >> >> -- >> >> *Evan Chan* >> >> Senior Software Engineer | >> >> e...@ooyala.com | (650) 996-4600 >> >> www.ooyala.com | blog <http://www.ooyala.com/blog> | >> >> @ooyala<http://www.twitter.com/ooyala> >> >> >> -- Hisham Mardam Bey Director of Engineering | Mate1 Inc. 4200 St. Laurent Boulevard | Suite 550 Montreal, Quebec | H2W 2R2 t. +1.514.393.1414 x264 A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? -=[ Codito Ergo Sum ]=-