Evan,

We don't roll back offset at this moment. Since the offset is a long, it
can last for a really long time. If you write 1TB a day, you can keep going
for about 4 million days.

Plus, you can always use more partitions (each partition has its own
offset).

Thanks,

Jun

On Fri, Nov 18, 2011 at 9:40 AM, Evan Chan <e...@ooyala.com> wrote:

> Jun,
>
> How do offsets keep increasing?  Eventually they have to rollover back to
> 0, right?    What happens if Kafka runs for months, eventually the offset
> rolls back, right?
>
> On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > 4) is incorrect. "Last offset" remains to be 'a' even after the data is
> > cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle
> > offsets. They keep increasing.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <tgaut...@tagged.com>
> > wrote:
> >
> > > I don't use high level consumers - just low level.  What I was thinking
> > was
> > > the following.  Let's assume I have turned off ZK in my setup.
> > >
> > > 1) Send 1 message to topic A.  Kafka creates a directory and log
> segment
> > > for A.  The log segment starts at 0.   Now, the "last offset" of the
> > topic
> > > is a.
> > >
> > > 2) A consumer reads from topic A the message, and records that the most
> > > recent offset in topic A is a.
> > >
> > > 3) Much time passes, the cleaner runs, and deletes the log segment
> > >
> > > 4) More time passes, I restart Kafka.  Kafka sees the topic A
> directory,
> > > but has no segment file to initialize from.  So the "last offset" is
> > > considered to be 0.
> > >
> > > 5) Send 1 message to topic A.  Kafka creates a log segment for A
> starting
> > > at 0.   The new last offset of the topic is a'.
> > >
> > > 6) The consumer from step 2 tries to read from Kafka at offset a, but
> > this
> > > is now an invalid offset.
> > >
> > > Does that sound right?  I haven't tried this yet, I'm just doing a
> > thought
> > > experiment here to try to figure out what would happen.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <jun...@gmail.com> wrote:
> > >
> > > > This is true for the high-level ZK-based consumer.
> > > >
> > > > Jun
> > > >
> > > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <inder.p...@gmail.com>
> > > wrote:
> > > >
> > > > > Jun & Taylor,
> > > > > would it be right to say that consumers without ZK won't be a
> viable
> > > > option
> > > > > if you can't handle replay of old messages in your application.
> > > > >
> > > > > - inder
> > > > >
> > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <jun...@gmail.com>
> wrote:
> > > > >
> > > > > > Taylor,
> > > > > >
> > > > > > When you start a consumer, it always tries to get the last
> > > checkpointed
> > > > > > offset from ZK. If no offset can be found in ZK, the consumer
> > starts
> > > > from
> > > > > > either the smallest or the largest available offset in the
> broker.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <
> > tgaut...@tagged.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > hmmm - and if you turn off zookeeper?
> > > > > > >
> > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <
> > inder.p...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > The consumer offsets are stored in ZooKeeper by topic and
> > > > partition.
> > > > > > > > That's how in a consumer fail over scenario you don't get
> > > duplicate
> > > > > > > > messages
> > > > > > > >
> > > > > > > > - Inder
> > > > > > > >
> > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier <
> > > > > tgaut...@tagged.com
> > > > > > > > >wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > We've noticed that the cleaner script in Kafka removes
> empty
> > > log
> > > > > > > segments
> > > > > > > > > but not the directories themselves.  I am actually
> wondering
> > > > > > something
> > > > > > > -
> > > > > > > > I
> > > > > > > > > always assumed that Kafka could restore the latest offset
> for
> > > > > > existing
> > > > > > > > > topics by scanning the log directory for all directories
> and
> > > > > scanning
> > > > > > > the
> > > > > > > > > directories for log segment files to restore the latest
> > offset.
> > > > > > > > >
> > > > > > > > > Now this conclusion I have made simply by observation - so
> it
> > > > could
> > > > > > be
> > > > > > > > > entirely wrong.
> > > > > > > > >
> > > > > > > > > My question is however - if I am right, and the cleaner
> > removes
> > > > all
> > > > > > the
> > > > > > > > log
> > > > > > > > > segments for a given topic so that a given topic directory
> is
> > > > > empty,
> > > > > > > how
> > > > > > > > > does Kafka behave when restarted?  How does it know what
> the
> > > next
> > > > > > > offset
> > > > > > > > > should be?
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Inder
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Inder
> > > > >
> > > >
> > >
> >
>
>
>
> --
> --
> *Evan Chan*
> Senior Software Engineer |
> e...@ooyala.com | (650) 996-4600
> www.ooyala.com | blog <http://www.ooyala.com/blog> |
> @ooyala<http://www.twitter.com/ooyala>
>

Reply via email to