Jun, I think Ed is suggesting a good improvement to the design doc: line 203 on http://svn.apache.org/viewvc/incubator/kafka/site/design.html?view=markup
That paragraph does seem to mix up the discussion between the high-level consumer and consumers that maintain their own state for more fine-grained "rewindability". At least, the first two lines of that paragraph seem to be talking about the high-level consumer, but not very clearly. Thanks, Joel On Fri, Apr 13, 2012 at 12:01 PM, Edward Smith <esm...@stardotstar.org>wrote: > Ack! No! I'm sorry, I'm probably just confusing the issue. I just > want to clarify the docs, not change the functionality. > > Maybe I'll try to sum it up the way I would write the jira: > > "Design.html is confusing to new users when it comes to where offset > data is stored by consumers." > > > On Fri, Apr 13, 2012 at 2:48 PM, Jun Rao <jun...@gmail.com> wrote: > > Ed, > > > > It seems that you are proposing a pluggable consumer offset store. We > don't > > have that now. Could you open a jira for that? > > > > Thanks, > > > > Jun > > > > On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esm...@stardotstar.org > >wrote: > > > >> Jun, > >> > >> Let me try to rephrase to see if I can get this point across more > clearly. > >> > >> I've been exploring the design by running the console tools. The > >> console consumer stores offset data in ZK. This appears to be the > >> default behavior in a Kafka deployment. For example, if you skip down > >> to "Consumers and Consumer Groups", it says that offsets are stored in > >> ZK. > >> > >> This paragraph that I want to change, is basically describing an > >> alternative technique of tracking offsets. It has been confusing to > >> me as I've tried to understand the design of Kafka, so I want to see > >> if we can clarify it somehow. > >> > >> Ed > >> > >> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <jun...@gmail.com> wrote: > >> > Ed, > >> > > >> > The design page only describes how the high level consumer (which most > >> > people use) works. The high level consumer currently doesn't expose > >> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is > >> not > >> > described. We can have a wiki describing it and put your content > there. > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith < > esm...@stardotstar.org > >> >wrote: > >> > > >> >> Sorry.... here it is with more clarity: > >> >> > >> >> Basically I'm adding to the beginning of the 2nd section titled > >> "Consumer > >> >> State" > >> >> > >> >> ---------------------------------------- > >> >> <h3>Consumer State</h3> (the second heading like this in the file) > >> >> <p> > >> >> In Kafka, the consumers are responsible for maintaining state > >> >> information on what has been consumed. The core Kafka consumers > >> >> write their state data to zookeeper. > >> >> </p> > >> >> <p> > >> >> However, it may be beneficial for consumers to write state data into > >> >> the same datastore where they are writing the results of their > >> >> processing. For example, the consumer may simply be entering some > >> >> aggregate value into a centralized...... > >> >> .. > >> >> (rest of section remains the same from here) > >> >> .. > >> >> </p> > >> >> ------------------------------------------ > >> >> > >> >> > >> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <jun...@gmail.com> wrote: > >> >> > Ed, > >> >> > > >> >> > I don't see the change you want to make. Apache mailing list > doesn't > >> take > >> >> > attachments. If you have attachments, the easiest way is probably > to > >> >> attach > >> >> > that to a jira. > >> >> > > >> >> > Thanks, > >> >> > > >> >> > Jun > >> >> > > >> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith < > >> esm...@stardotstar.org > >> >> >wrote: > >> >> > > >> >> >> I didn't want to open up a bug unless there was some concurrence > on > >> >> >> this. Please review the change below and see if I'm just > >> >> >> misunderstanding things or not. This paragraph in the doc took > me a > >> >> >> long time to digest because it was describing the contrib/hadoop > >> >> >> consumer and not how simpleconsumer or consoleconsumer work: > >> >> >> > >> >> >> Consumer State (the second heading like this in the file) > >> >> >> > >> >> >> In Kafka, the consumers are responsible for maintaining state > >> >> >> information on what has been consumed. The core Kafka consumers > >> write > >> >> >> their state data to zookeeper. > >> >> >> > >> >> >> However, it may be beneficial for consumers to write state data > into > >> >> >> the same datastore where they are writing the results of their > >> >> >> processing. For example, the consumer may simply be entering some > >> >> >> aggregate value into a centralized...... (rest of section remains > the > >> >> >> same from here) > >> >> >> > >> >> >> Ed > >> >> >> > >> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <jun...@gmail.com> > wrote: > >> >> >> > Currently, as you are iterating messages returned by > >> SimpleConsumer, > >> >> you > >> >> >> > also get the offset for the next message. In the map, you can > just > >> run > >> >> >> for > >> >> >> > 30 mins and save the next offset for the next run. > >> >> >> > > >> >> >> > Thanks, > >> >> >> > > >> >> >> > Jun > >> >> >> > > >> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <mypostbo...@gmail.com> > >> wrote: > >> >> >> > > >> >> >> >> Hi , > >> >> >> >> > >> >> >> >> I looked at hadoop-consumer , which fetches data directly from > the > >> >> kafka > >> >> >> >> broker . But from what i understand it is based on min and max > >> offset > >> >> >> and > >> >> >> >> map task complete once they reach the maximum offset for a > given > >> >> topic . > >> >> >> >> > >> >> >> >> In our use case we would not know about the max offset before > >> hand. > >> >> >> Instead > >> >> >> >> we want map to keep reading data from a min offset and roll > over > >> >> every > >> >> >> 30 > >> >> >> >> mins . At 30th min we would again generate the offsets which > >> would be > >> >> >> used > >> >> >> >> for the next run. > >> >> >> >> > >> >> >> >> any suggestions would be helpful . > >> >> >> >> > >> >> >> >> regards, > >> >> >> >> rks > >> >> >> >> > >> >> >> > >> >> > >> >