Ed, It seems that you are proposing a pluggable consumer offset store. We don't have that now. Could you open a jira for that?
Thanks, Jun On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esm...@stardotstar.org>wrote: > Jun, > > Let me try to rephrase to see if I can get this point across more clearly. > > I've been exploring the design by running the console tools. The > console consumer stores offset data in ZK. This appears to be the > default behavior in a Kafka deployment. For example, if you skip down > to "Consumers and Consumer Groups", it says that offsets are stored in > ZK. > > This paragraph that I want to change, is basically describing an > alternative technique of tracking offsets. It has been confusing to > me as I've tried to understand the design of Kafka, so I want to see > if we can clarify it somehow. > > Ed > > On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <jun...@gmail.com> wrote: > > Ed, > > > > The design page only describes how the high level consumer (which most > > people use) works. The high level consumer currently doesn't expose > > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is > not > > described. We can have a wiki describing it and put your content there. > > > > Thanks, > > > > Jun > > > > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <esm...@stardotstar.org > >wrote: > > > >> Sorry.... here it is with more clarity: > >> > >> Basically I'm adding to the beginning of the 2nd section titled > "Consumer > >> State" > >> > >> ---------------------------------------- > >> <h3>Consumer State</h3> (the second heading like this in the file) > >> <p> > >> In Kafka, the consumers are responsible for maintaining state > >> information on what has been consumed. The core Kafka consumers > >> write their state data to zookeeper. > >> </p> > >> <p> > >> However, it may be beneficial for consumers to write state data into > >> the same datastore where they are writing the results of their > >> processing. For example, the consumer may simply be entering some > >> aggregate value into a centralized...... > >> .. > >> (rest of section remains the same from here) > >> .. > >> </p> > >> ------------------------------------------ > >> > >> > >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <jun...@gmail.com> wrote: > >> > Ed, > >> > > >> > I don't see the change you want to make. Apache mailing list doesn't > take > >> > attachments. If you have attachments, the easiest way is probably to > >> attach > >> > that to a jira. > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith < > esm...@stardotstar.org > >> >wrote: > >> > > >> >> I didn't want to open up a bug unless there was some concurrence on > >> >> this. Please review the change below and see if I'm just > >> >> misunderstanding things or not. This paragraph in the doc took me a > >> >> long time to digest because it was describing the contrib/hadoop > >> >> consumer and not how simpleconsumer or consoleconsumer work: > >> >> > >> >> Consumer State (the second heading like this in the file) > >> >> > >> >> In Kafka, the consumers are responsible for maintaining state > >> >> information on what has been consumed. The core Kafka consumers > write > >> >> their state data to zookeeper. > >> >> > >> >> However, it may be beneficial for consumers to write state data into > >> >> the same datastore where they are writing the results of their > >> >> processing. For example, the consumer may simply be entering some > >> >> aggregate value into a centralized...... (rest of section remains the > >> >> same from here) > >> >> > >> >> Ed > >> >> > >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <jun...@gmail.com> wrote: > >> >> > Currently, as you are iterating messages returned by > SimpleConsumer, > >> you > >> >> > also get the offset for the next message. In the map, you can just > run > >> >> for > >> >> > 30 mins and save the next offset for the next run. > >> >> > > >> >> > Thanks, > >> >> > > >> >> > Jun > >> >> > > >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <mypostbo...@gmail.com> > wrote: > >> >> > > >> >> >> Hi , > >> >> >> > >> >> >> I looked at hadoop-consumer , which fetches data directly from the > >> kafka > >> >> >> broker . But from what i understand it is based on min and max > offset > >> >> and > >> >> >> map task complete once they reach the maximum offset for a given > >> topic . > >> >> >> > >> >> >> In our use case we would not know about the max offset before > hand. > >> >> Instead > >> >> >> we want map to keep reading data from a min offset and roll over > >> every > >> >> 30 > >> >> >> mins . At 30th min we would again generate the offsets which > would be > >> >> used > >> >> >> for the next run. > >> >> >> > >> >> >> any suggestions would be helpful . > >> >> >> > >> >> >> regards, > >> >> >> rks > >> >> >> > >> >> > >> >