Ack! No! I'm sorry, I'm probably just confusing the issue. I just want to clarify the docs, not change the functionality.
Maybe I'll try to sum it up the way I would write the jira: "Design.html is confusing to new users when it comes to where offset data is stored by consumers." On Fri, Apr 13, 2012 at 2:48 PM, Jun Rao <jun...@gmail.com> wrote: > Ed, > > It seems that you are proposing a pluggable consumer offset store. We don't > have that now. Could you open a jira for that? > > Thanks, > > Jun > > On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esm...@stardotstar.org>wrote: > >> Jun, >> >> Let me try to rephrase to see if I can get this point across more clearly. >> >> I've been exploring the design by running the console tools. The >> console consumer stores offset data in ZK. This appears to be the >> default behavior in a Kafka deployment. For example, if you skip down >> to "Consumers and Consumer Groups", it says that offsets are stored in >> ZK. >> >> This paragraph that I want to change, is basically describing an >> alternative technique of tracking offsets. It has been confusing to >> me as I've tried to understand the design of Kafka, so I want to see >> if we can clarify it somehow. >> >> Ed >> >> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <jun...@gmail.com> wrote: >> > Ed, >> > >> > The design page only describes how the high level consumer (which most >> > people use) works. The high level consumer currently doesn't expose >> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is >> not >> > described. We can have a wiki describing it and put your content there. >> > >> > Thanks, >> > >> > Jun >> > >> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <esm...@stardotstar.org >> >wrote: >> > >> >> Sorry.... here it is with more clarity: >> >> >> >> Basically I'm adding to the beginning of the 2nd section titled >> "Consumer >> >> State" >> >> >> >> ---------------------------------------- >> >> <h3>Consumer State</h3> (the second heading like this in the file) >> >> <p> >> >> In Kafka, the consumers are responsible for maintaining state >> >> information on what has been consumed. The core Kafka consumers >> >> write their state data to zookeeper. >> >> </p> >> >> <p> >> >> However, it may be beneficial for consumers to write state data into >> >> the same datastore where they are writing the results of their >> >> processing. For example, the consumer may simply be entering some >> >> aggregate value into a centralized...... >> >> .. >> >> (rest of section remains the same from here) >> >> .. >> >> </p> >> >> ------------------------------------------ >> >> >> >> >> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <jun...@gmail.com> wrote: >> >> > Ed, >> >> > >> >> > I don't see the change you want to make. Apache mailing list doesn't >> take >> >> > attachments. If you have attachments, the easiest way is probably to >> >> attach >> >> > that to a jira. >> >> > >> >> > Thanks, >> >> > >> >> > Jun >> >> > >> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith < >> esm...@stardotstar.org >> >> >wrote: >> >> > >> >> >> I didn't want to open up a bug unless there was some concurrence on >> >> >> this. Please review the change below and see if I'm just >> >> >> misunderstanding things or not. This paragraph in the doc took me a >> >> >> long time to digest because it was describing the contrib/hadoop >> >> >> consumer and not how simpleconsumer or consoleconsumer work: >> >> >> >> >> >> Consumer State (the second heading like this in the file) >> >> >> >> >> >> In Kafka, the consumers are responsible for maintaining state >> >> >> information on what has been consumed. The core Kafka consumers >> write >> >> >> their state data to zookeeper. >> >> >> >> >> >> However, it may be beneficial for consumers to write state data into >> >> >> the same datastore where they are writing the results of their >> >> >> processing. For example, the consumer may simply be entering some >> >> >> aggregate value into a centralized...... (rest of section remains the >> >> >> same from here) >> >> >> >> >> >> Ed >> >> >> >> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <jun...@gmail.com> wrote: >> >> >> > Currently, as you are iterating messages returned by >> SimpleConsumer, >> >> you >> >> >> > also get the offset for the next message. In the map, you can just >> run >> >> >> for >> >> >> > 30 mins and save the next offset for the next run. >> >> >> > >> >> >> > Thanks, >> >> >> > >> >> >> > Jun >> >> >> > >> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <mypostbo...@gmail.com> >> wrote: >> >> >> > >> >> >> >> Hi , >> >> >> >> >> >> >> >> I looked at hadoop-consumer , which fetches data directly from the >> >> kafka >> >> >> >> broker . But from what i understand it is based on min and max >> offset >> >> >> and >> >> >> >> map task complete once they reach the maximum offset for a given >> >> topic . >> >> >> >> >> >> >> >> In our use case we would not know about the max offset before >> hand. >> >> >> Instead >> >> >> >> we want map to keep reading data from a min offset and roll over >> >> every >> >> >> 30 >> >> >> >> mins . At 30th min we would again generate the offsets which >> would be >> >> >> used >> >> >> >> for the next run. >> >> >> >> >> >> >> >> any suggestions would be helpful . >> >> >> >> >> >> >> >> regards, >> >> >> >> rks >> >> >> >> >> >> >> >> >> >>