In the broker, the name of each log file contains the offset of the first
message in that file. So the last offset can be computed by filename +
filelength.

Jun

On Fri, Nov 18, 2011 at 8:52 AM, Taylor Gautier <tgaut...@tagged.com> wrote:

> Right. I'm talking about the broker. Where does it store what is the
> most recent offset if there are no log segments?  And no ZK.
>
>
>
> On Nov 18, 2011, at 8:50 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > What I described is what happens in the broker. If you use
> SimpleConsumer,
> > then it's the consumer's responsibility to remember the last offset. The
> > server doesn't store the state for consumers.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Nov 18, 2011 at 8:19 AM, Taylor Gautier <tgaut...@tagged.com>
> wrote:
> >
> >> how?  where is the information kept?  If ZK is not around, and it's not
> on
> >> disk, how is this information passed to the next process after the
> restart?
> >>
> >> On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <jun...@gmail.com> wrote:
> >>
> >>> 4) is incorrect. "Last offset" remains to be 'a' even after the data is
> >>> cleaned. So in 5), the offset will be 2 x 'a'. That is, we never
> recycle
> >>> offsets. They keep increasing.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <tgaut...@tagged.com>
> >>> wrote:
> >>>
> >>>> I don't use high level consumers - just low level.  What I was
> thinking
> >>> was
> >>>> the following.  Let's assume I have turned off ZK in my setup.
> >>>>
> >>>> 1) Send 1 message to topic A.  Kafka creates a directory and log
> >> segment
> >>>> for A.  The log segment starts at 0.   Now, the "last offset" of the
> >>> topic
> >>>> is a.
> >>>>
> >>>> 2) A consumer reads from topic A the message, and records that the
> most
> >>>> recent offset in topic A is a.
> >>>>
> >>>> 3) Much time passes, the cleaner runs, and deletes the log segment
> >>>>
> >>>> 4) More time passes, I restart Kafka.  Kafka sees the topic A
> >> directory,
> >>>> but has no segment file to initialize from.  So the "last offset" is
> >>>> considered to be 0.
> >>>>
> >>>> 5) Send 1 message to topic A.  Kafka creates a log segment for A
> >> starting
> >>>> at 0.   The new last offset of the topic is a'.
> >>>>
> >>>> 6) The consumer from step 2 tries to read from Kafka at offset a, but
> >>> this
> >>>> is now an invalid offset.
> >>>>
> >>>> Does that sound right?  I haven't tried this yet, I'm just doing a
> >>> thought
> >>>> experiment here to try to figure out what would happen.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <jun...@gmail.com> wrote:
> >>>>
> >>>>> This is true for the high-level ZK-based consumer.
> >>>>>
> >>>>> Jun
> >>>>>
> >>>>> On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <inder.p...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> Jun & Taylor,
> >>>>>> would it be right to say that consumers without ZK won't be a
> >> viable
> >>>>> option
> >>>>>> if you can't handle replay of old messages in your application.
> >>>>>>
> >>>>>> - inder
> >>>>>>
> >>>>>> On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <jun...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> Taylor,
> >>>>>>>
> >>>>>>> When you start a consumer, it always tries to get the last
> >>>> checkpointed
> >>>>>>> offset from ZK. If no offset can be found in ZK, the consumer
> >>> starts
> >>>>> from
> >>>>>>> either the smallest or the largest available offset in the
> >> broker.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Jun
> >>>>>>>
> >>>>>>> On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <
> >>> tgaut...@tagged.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> hmmm - and if you turn off zookeeper?
> >>>>>>>>
> >>>>>>>> On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <
> >>> inder.p...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> The consumer offsets are stored in ZooKeeper by topic and
> >>>>> partition.
> >>>>>>>>> That's how in a consumer fail over scenario you don't get
> >>>> duplicate
> >>>>>>>>> messages
> >>>>>>>>>
> >>>>>>>>> - Inder
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <
> >>>>>> tgaut...@tagged.com
> >>>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> We've noticed that the cleaner script in Kafka removes
> >> empty
> >>>> log
> >>>>>>>> segments
> >>>>>>>>>> but not the directories themselves.  I am actually
> >> wondering
> >>>>>>> something
> >>>>>>>> -
> >>>>>>>>> I
> >>>>>>>>>> always assumed that Kafka could restore the latest offset
> >> for
> >>>>>>> existing
> >>>>>>>>>> topics by scanning the log directory for all directories
> >> and
> >>>>>> scanning
> >>>>>>>> the
> >>>>>>>>>> directories for log segment files to restore the latest
> >>> offset.
> >>>>>>>>>>
> >>>>>>>>>> Now this conclusion I have made simply by observation - so
> >> it
> >>>>> could
> >>>>>>> be
> >>>>>>>>>> entirely wrong.
> >>>>>>>>>>
> >>>>>>>>>> My question is however - if I am right, and the cleaner
> >>> removes
> >>>>> all
> >>>>>>> the
> >>>>>>>>> log
> >>>>>>>>>> segments for a given topic so that a given topic directory
> >> is
> >>>>>> empty,
> >>>>>>>> how
> >>>>>>>>>> does Kafka behave when restarted?  How does it know what
> >> the
> >>>> next
> >>>>>>>> offset
> >>>>>>>>>> should be?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> -- Inder
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -- Inder
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Reply via email to