I don't use high level consumers - just low level. What I was thinking was the following. Let's assume I have turned off ZK in my setup.
1) Send 1 message to topic A. Kafka creates a directory and log segment for A. The log segment starts at 0. Now, the "last offset" of the topic is a. 2) A consumer reads from topic A the message, and records that the most recent offset in topic A is a. 3) Much time passes, the cleaner runs, and deletes the log segment 4) More time passes, I restart Kafka. Kafka sees the topic A directory, but has no segment file to initialize from. So the "last offset" is considered to be 0. 5) Send 1 message to topic A. Kafka creates a log segment for A starting at 0. The new last offset of the topic is a'. 6) The consumer from step 2 tries to read from Kafka at offset a, but this is now an invalid offset. Does that sound right? I haven't tried this yet, I'm just doing a thought experiment here to try to figure out what would happen. On Thu, Nov 17, 2011 at 11:01 PM, Jun Rao <jun...@gmail.com> wrote: > This is true for the high-level ZK-based consumer. > > Jun > > On Thu, Nov 17, 2011 at 10:59 PM, Inder Pall <inder.p...@gmail.com> wrote: > > > Jun & Taylor, > > would it be right to say that consumers without ZK won't be a viable > option > > if you can't handle replay of old messages in your application. > > > > - inder > > > > On Fri, Nov 18, 2011 at 12:27 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > Taylor, > > > > > > When you start a consumer, it always tries to get the last checkpointed > > > offset from ZK. If no offset can be found in ZK, the consumer starts > from > > > either the smallest or the largest available offset in the broker. > > > > > > Thanks, > > > > > > Jun > > > > > > On Thu, Nov 17, 2011 at 9:20 PM, Taylor Gautier <tgaut...@tagged.com> > > > wrote: > > > > > > > hmmm - and if you turn off zookeeper? > > > > > > > > On Thu, Nov 17, 2011 at 9:15 PM, Inder Pall <inder.p...@gmail.com> > > > wrote: > > > > > > > > > The consumer offsets are stored in ZooKeeper by topic and > partition. > > > > > That's how in a consumer fail over scenario you don't get duplicate > > > > > messages > > > > > > > > > > - Inder > > > > > > > > > > On Fri, Nov 18, 2011 at 10:33 AM, Taylor Gautier < > > tgaut...@tagged.com > > > > > >wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > We've noticed that the cleaner script in Kafka removes empty log > > > > segments > > > > > > but not the directories themselves. I am actually wondering > > > something > > > > - > > > > > I > > > > > > always assumed that Kafka could restore the latest offset for > > > existing > > > > > > topics by scanning the log directory for all directories and > > scanning > > > > the > > > > > > directories for log segment files to restore the latest offset. > > > > > > > > > > > > Now this conclusion I have made simply by observation - so it > could > > > be > > > > > > entirely wrong. > > > > > > > > > > > > My question is however - if I am right, and the cleaner removes > all > > > the > > > > > log > > > > > > segments for a given topic so that a given topic directory is > > empty, > > > > how > > > > > > does Kafka behave when restarted? How does it know what the next > > > > offset > > > > > > should be? > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Inder > > > > > > > > > > > > > > > > > > > > -- > > -- Inder > > >