I don't use high level consumers - just low level.  What I was thinking was
the following.  Let's assume I have turned off ZK in my setup.

1) Send 1 message to topic A.  Kafka creates a directory and log segment
for A.  The log segment starts at 0.   Now, the "last offset" of the topic
is a.

2) A consumer reads from topic A the message, and records that the most
recent offset in topic A is a.

3) Much time passes, the cleaner runs, and deletes the log segment

4) More time passes, I restart Kafka.  Kafka sees the topic A directory,
but has no segment file to initialize from.  So the "last offset" is
considered to be 0.

5) Send 1 message to topic A.  Kafka creates a log segment for A starting
at 0.   The new last offset of the topic is a'.

6) The consumer from step 2 tries to read from Kafka at offset a, but this
is now an invalid offset.

Does that sound right?  I haven't tried this yet, I'm just doing a thought
experiment here to try to figure out what would happen.




On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <jun...@gmail.com> wrote:

> This is true for the high-level ZK-based consumer.
>
> Jun
>
> On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <inder.p...@gmail.com> wrote:
>
> > Jun & Taylor,
> > would it be right to say that consumers without ZK won't be a viable
> option
> > if you can't handle replay of old messages in your application.
> >
> > - inder
> >
> > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Taylor,
> > >
> > > When you start a consumer, it always tries to get the last checkpointed
> > > offset from ZK. If no offset can be found in ZK, the consumer starts
> from
> > > either the smallest or the largest available offset in the broker.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <tgaut...@tagged.com>
> > > wrote:
> > >
> > > > hmmm - and if you turn off zookeeper?
> > > >
> > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <inder.p...@gmail.com>
> > > wrote:
> > > >
> > > > > The consumer offsets are stored in ZooKeeper by topic and
> partition.
> > > > > That's how in a consumer fail over scenario you don't get duplicate
> > > > > messages
> > > > >
> > > > > - Inder
> > > > >
> > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <
> > tgaut...@tagged.com
> > > > > >wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We've noticed that the cleaner script in Kafka removes empty log
> > > > segments
> > > > > > but not the directories themselves.  I am actually wondering
> > > something
> > > > -
> > > > > I
> > > > > > always assumed that Kafka could restore the latest offset for
> > > existing
> > > > > > topics by scanning the log directory for all directories and
> > scanning
> > > > the
> > > > > > directories for log segment files to restore the latest offset.
> > > > > >
> > > > > > Now this conclusion I have made simply by observation - so it
> could
> > > be
> > > > > > entirely wrong.
> > > > > >
> > > > > > My question is however - if I am right, and the cleaner removes
> all
> > > the
> > > > > log
> > > > > > segments for a given topic so that a given topic directory is
> > empty,
> > > > how
> > > > > > does Kafka behave when restarted?  How does it know what the next
> > > > offset
> > > > > > should be?
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Inder
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Inder
> >
>

Reply via email to