Evan, We don't roll back offset at this moment. Since the offset is a long, it can last for a really long time. If you write 1TB a day, you can keep going for about 4 million days.
Plus, you can always use more partitions (each partition has its own offset). Thanks, Jun On Fri, Nov 18, 2011 at 9:40 AM, Evan Chan <e...@ooyala.com> wrote: > Jun, > > How do offsets keep increasing? Eventually they have to rollover back to > 0, right? What happens if Kafka runs for months, eventually the offset > rolls back, right? > > On Fri, Nov 18, 2011 at 8:04 AM, Jun Rao <jun...@gmail.com> wrote: > > > 4) is incorrect. "Last offset" remains to be 'a' even after the data is > > cleaned. So in 5), the offset will be 2 x 'a'. That is, we never recycle > > offsets. They keep increasing. > > > > Thanks, > > > > Jun > > > > On Fri, Nov 18, 2011 at 7:02 AM, Taylor Gautier <tgaut...@tagged.com> > > wrote: > > > > > I don't use high level consumers - just low level. What I was thinking > > was > > > the following. Let's assume I have turned off ZK in my setup. > > > > > > 1) Send 1 message to topic A. Kafka creates a directory and log > segment > > > for A. The log segment starts at 0. Now, the "last offset" of the > > topic > > > is a. > > > > > > 2) A consumer reads from topic A the message, and records that the most > > > recent offset in topic A is a. > > > > > > 3) Much time passes, the cleaner runs, and deletes the log segment > > > > > > 4) More time passes, I restart Kafka. Kafka sees the topic A > directory, > > > but has no segment file to initialize from. So the "last offset" is > > > considered to be 0. > > > > > > 5) Send 1 message to topic A. Kafka creates a log segment for A > starting > > > at 0. The new last offset of the topic is a'. > > > > > > 6) The consumer from step 2 tries to read from Kafka at offset a, but > > this > > > is now an invalid offset. > > > > > > Does that sound right? I haven't tried this yet, I'm just doing a > > thought > > > experiment here to try to figure out what would happen. > > > > > > > > > > > > > > > On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > This is true for the high-level ZK-based consumer. > > > > > > > > Jun > > > > > > > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <inder.p...@gmail.com> > > > wrote: > > > > > > > > > Jun & Taylor, > > > > > would it be right to say that consumers without ZK won't be a > viable > > > > option > > > > > if you can't handle replay of old messages in your application. > > > > > > > > > > - inder > > > > > > > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <jun...@gmail.com> > wrote: > > > > > > > > > > > Taylor, > > > > > > > > > > > > When you start a consumer, it always tries to get the last > > > checkpointed > > > > > > offset from ZK. If no offset can be found in ZK, the consumer > > starts > > > > from > > > > > > either the smallest or the largest available offset in the > broker. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier < > > tgaut...@tagged.com > > > > > > > > > > wrote: > > > > > > > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall < > > inder.p...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > > > > partition. > > > > > > > > That's how in a consumer fail over scenario you don't get > > > duplicate > > > > > > > > messages > > > > > > > > > > > > > > > > - Inder > > > > > > > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > > > > tgaut...@tagged.com > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes > empty > > > log > > > > > > > segments > > > > > > > > > but not the directories themselves. I am actually > wondering > > > > > > something > > > > > > > - > > > > > > > > I > > > > > > > > > always assumed that Kafka could restore the latest offset > for > > > > > > existing > > > > > > > > > topics by scanning the log directory for all directories > and > > > > > scanning > > > > > > > the > > > > > > > > > directories for log segment files to restore the latest > > offset. > > > > > > > > > > > > > > > > > > Now this conclusion I have made simply by observation - so > it > > > > could > > > > > > be > > > > > > > > > entirely wrong. > > > > > > > > > > > > > > > > > > My question is however - if I am right, and the cleaner > > removes > > > > all > > > > > > the > > > > > > > > log > > > > > > > > > segments for a given topic so that a given topic directory > is > > > > > empty, > > > > > > > how > > > > > > > > > does Kafka behave when restarted? How does it know what > the > > > next > > > > > > > offset > > > > > > > > > should be? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > -- Inder > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Inder > > > > > > > > > > > > > > > > > > -- > -- > *Evan Chan* > Senior Software Engineer | > e...@ooyala.com | (650) 996-4600 > www.ooyala.com | blog <http://www.ooyala.com/blog> | > @ooyala<http://www.twitter.com/ooyala> >