how?  where is the information kept?  If ZK is not around, and it's not on
disk, how is this information passed to the next process after the restart?

On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <jun...@gmail.com> wrote:

> 4) is incorrect. "Last offset" remains to be 'a' even after the data is
> cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle
> offsets. They keep increasing.
>
> Thanks,
>
> Jun
>
> On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <tgaut...@tagged.com>
> wrote:
>
> > I don't use high level consumers - just low level.  What I was thinking
> was
> > the following.  Let's assume I have turned off ZK in my setup.
> >
> > 1) Send 1 message to topic A.  Kafka creates a directory and log segment
> > for A.  The log segment starts at 0.   Now, the "last offset" of the
> topic
> > is a.
> >
> > 2) A consumer reads from topic A the message, and records that the most
> > recent offset in topic A is a.
> >
> > 3) Much time passes, the cleaner runs, and deletes the log segment
> >
> > 4) More time passes, I restart Kafka.  Kafka sees the topic A directory,
> > but has no segment file to initialize from.  So the "last offset" is
> > considered to be 0.
> >
> > 5) Send 1 message to topic A.  Kafka creates a log segment for A starting
> > at 0.   The new last offset of the topic is a'.
> >
> > 6) The consumer from step 2 tries to read from Kafka at offset a, but
> this
> > is now an invalid offset.
> >
> > Does that sound right?  I haven't tried this yet, I'm just doing a
> thought
> > experiment here to try to figure out what would happen.
> >
> >
> >
> >
> > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > This is true for the high-level ZK-based consumer.
> > >
> > > Jun
> > >
> > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <inder.p...@gmail.com>
> > wrote:
> > >
> > > > Jun & Taylor,
> > > > would it be right to say that consumers without ZK won't be a viable
> > > option
> > > > if you can't handle replay of old messages in your application.
> > > >
> > > > - inder
> > > >
> > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <jun...@gmail.com> wrote:
> > > >
> > > > > Taylor,
> > > > >
> > > > > When you start a consumer, it always tries to get the last
> > checkpointed
> > > > > offset from ZK. If no offset can be found in ZK, the consumer
> starts
> > > from
> > > > > either the smallest or the largest available offset in the broker.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <
> tgaut...@tagged.com
> > >
> > > > > wrote:
> > > > >
> > > > > > hmmm - and if you turn off zookeeper?
> > > > > >
> > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <
> inder.p...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > The consumer offsets are stored in ZooKeeper by topic and
> > > partition.
> > > > > > > That's how in a consumer fail over scenario you don't get
> > duplicate
> > > > > > > messages
> > > > > > >
> > > > > > > - Inder
> > > > > > >
> > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <
> > > > tgaut...@tagged.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > We've noticed that the cleaner script in Kafka removes empty
> > log
> > > > > > segments
> > > > > > > > but not the directories themselves.  I am actually wondering
> > > > > something
> > > > > > -
> > > > > > > I
> > > > > > > > always assumed that Kafka could restore the latest offset for
> > > > > existing
> > > > > > > > topics by scanning the log directory for all directories and
> > > > scanning
> > > > > > the
> > > > > > > > directories for log segment files to restore the latest
> offset.
> > > > > > > >
> > > > > > > > Now this conclusion I have made simply by observation - so it
> > > could
> > > > > be
> > > > > > > > entirely wrong.
> > > > > > > >
> > > > > > > > My question is however - if I am right, and the cleaner
> removes
> > > all
> > > > > the
> > > > > > > log
> > > > > > > > segments for a given topic so that a given topic directory is
> > > > empty,
> > > > > > how
> > > > > > > > does Kafka behave when restarted?  How does it know what the
> > next
> > > > > > offset
> > > > > > > > should be?
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Inder
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Inder
> > > >
> > >
> >
>

Reply via email to