Right. I'm talking about the broker. Where does it store what is the most recent offset if there are no log segments? And no ZK.
On Nov 18, 2011, at 8:50 AM, Jun Rao <jun...@gmail.com> wrote: > What I described is what happens in the broker. If you use SimpleConsumer, > then it's the consumer's responsibility to remember the last offset. The > server doesn't store the state for consumers. > > Thanks, > > Jun > > On Fri, Nov 18, 2011 at 8:19 AM, Taylor Gautier <tgaut...@tagged.com> wrote: > >> how? where is the information kept? If ZK is not around, and it's not on >> disk, how is this information passed to the next process after the restart? >> >> On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <jun...@gmail.com> wrote: >> >>> 4) is incorrect. "Last offset" remains to be 'a' even after the data is >>> cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle >>> offsets. They keep increasing. >>> >>> Thanks, >>> >>> Jun >>> >>> On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <tgaut...@tagged.com> >>> wrote: >>> >>>> I don't use high level consumers - just low level. What I was thinking >>> was >>>> the following. Let's assume I have turned off ZK in my setup. >>>> >>>> 1) Send 1 message to topic A. Kafka creates a directory and log >> segment >>>> for A. The log segment starts at 0. Now, the "last offset" of the >>> topic >>>> is a. >>>> >>>> 2) A consumer reads from topic A the message, and records that the most >>>> recent offset in topic A is a. >>>> >>>> 3) Much time passes, the cleaner runs, and deletes the log segment >>>> >>>> 4) More time passes, I restart Kafka. Kafka sees the topic A >> directory, >>>> but has no segment file to initialize from. So the "last offset" is >>>> considered to be 0. >>>> >>>> 5) Send 1 message to topic A. Kafka creates a log segment for A >> starting >>>> at 0. The new last offset of the topic is a'. >>>> >>>> 6) The consumer from step 2 tries to read from Kafka at offset a, but >>> this >>>> is now an invalid offset. >>>> >>>> Does that sound right? I haven't tried this yet, I'm just doing a >>> thought >>>> experiment here to try to figure out what would happen. >>>> >>>> >>>> >>>> >>>> On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <jun...@gmail.com> wrote: >>>> >>>>> This is true for the high-level ZK-based consumer. >>>>> >>>>> Jun >>>>> >>>>> On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <inder.p...@gmail.com> >>>> wrote: >>>>> >>>>>> Jun & Taylor, >>>>>> would it be right to say that consumers without ZK won't be a >> viable >>>>> option >>>>>> if you can't handle replay of old messages in your application. >>>>>> >>>>>> - inder >>>>>> >>>>>> On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <jun...@gmail.com> >> wrote: >>>>>> >>>>>>> Taylor, >>>>>>> >>>>>>> When you start a consumer, it always tries to get the last >>>> checkpointed >>>>>>> offset from ZK. If no offset can be found in ZK, the consumer >>> starts >>>>> from >>>>>>> either the smallest or the largest available offset in the >> broker. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jun >>>>>>> >>>>>>> On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < >>> tgaut...@tagged.com >>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> hmmm - and if you turn off zookeeper? >>>>>>>> >>>>>>>> On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < >>> inder.p...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>>> The consumer offsets are stored in ZooKeeper by topic and >>>>> partition. >>>>>>>>> That's how in a consumer fail over scenario you don't get >>>> duplicate >>>>>>>>> messages >>>>>>>>> >>>>>>>>> - Inder >>>>>>>>> >>>>>>>>> On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < >>>>>> tgaut...@tagged.com >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> We've noticed that the cleaner script in Kafka removes >> empty >>>> log >>>>>>>> segments >>>>>>>>>> but not the directories themselves. I am actually >> wondering >>>>>>> something >>>>>>>> - >>>>>>>>> I >>>>>>>>>> always assumed that Kafka could restore the latest offset >> for >>>>>>> existing >>>>>>>>>> topics by scanning the log directory for all directories >> and >>>>>> scanning >>>>>>>> the >>>>>>>>>> directories for log segment files to restore the latest >>> offset. >>>>>>>>>> >>>>>>>>>> Now this conclusion I have made simply by observation - so >> it >>>>> could >>>>>>> be >>>>>>>>>> entirely wrong. >>>>>>>>>> >>>>>>>>>> My question is however - if I am right, and the cleaner >>> removes >>>>> all >>>>>>> the >>>>>>>>> log >>>>>>>>>> segments for a given topic so that a given topic directory >> is >>>>>> empty, >>>>>>>> how >>>>>>>>>> does Kafka behave when restarted? How does it know what >> the >>>> next >>>>>>>> offset >>>>>>>>>> should be? >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- Inder >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -- Inder >>>>>> >>>>> >>>> >>> >>