Re: Consumer State Description in design.html

Jun Rao Fri, 13 Apr 2012 11:49:23 -0700

Ed,

It seems that you are proposing a pluggable consumer offset store. We don't
have that now. Could you open a jira for that?


Thanks,

Jun

On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esm...@stardotstar.org>wrote:

> Jun,
>
> Let me try to rephrase to see if I can get this point across more clearly.
>
> I've been exploring the design by running the console tools.  The
> console consumer stores offset data in ZK.   This appears to be the
> default behavior in a Kafka deployment.  For example, if you skip down
> to "Consumers and Consumer Groups", it says that offsets are stored in
> ZK.
>
> This paragraph that I want to change, is basically describing an
> alternative technique of tracking offsets.  It has been confusing to
> me as I've tried to understand the design of Kafka, so I want to see
> if we can clarify it somehow.
>
> Ed
>
> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <jun...@gmail.com> wrote:
> > Ed,
> >
> > The design page only describes how the high level consumer (which most
> > people use) works. The high level consumer currently doesn't expose
> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is
> not
> > described. We can have a wiki describing it and put your content there.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <esm...@stardotstar.org
> >wrote:
> >
> >> Sorry.... here it is with more clarity:
> >>
> >> Basically I'm adding to the beginning of the 2nd section titled
> "Consumer
> >> State"
> >>
> >> ----------------------------------------
> >> <h3>Consumer State</h3> (the second heading like this in the file)
> >> <p>
> >> In Kafka, the consumers are responsible for maintaining state
> >> information on what has been  consumed.  The core Kafka consumers
> >> write their state data to zookeeper.
> >> </p>
> >> <p>
> >> However, it may be beneficial for consumers to write state data into
> >> the same datastore where they are writing the results of their
> >> processing.  For example, the consumer may simply be entering some
> >> aggregate value into a centralized......
> >> ..
> >> (rest of section remains the same from here)
> >> ..
> >> </p>
> >> ------------------------------------------
> >>
> >>
> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <jun...@gmail.com> wrote:
> >> > Ed,
> >> >
> >> > I don't see the change you want to make. Apache mailing list doesn't
> take
> >> > attachments. If you have attachments, the easiest way is probably to
> >> attach
> >> > that to a jira.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <
> esm...@stardotstar.org
> >> >wrote:
> >> >
> >> >> I didn't want to open up a bug unless there was some concurrence on
> >> >> this.   Please review the change below and see if I'm just
> >> >> misunderstanding things or not.  This paragraph in the doc took me a
> >> >> long time to digest because it was describing the contrib/hadoop
> >> >> consumer and not how simpleconsumer or consoleconsumer work:
> >> >>
> >> >> Consumer State (the second heading like this in the file)
> >> >>
> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> information on what has been consumed.  The core Kafka consumers
> write
> >> >> their state data to zookeeper.
> >> >>
> >> >> However, it may be beneficial for consumers to write state data into
> >> >> the same datastore where they are writing the results of their
> >> >> processing.  For example, the consumer may simply be entering some
> >> >> aggregate value into a centralized...... (rest of section remains the
> >> >> same from here)
> >> >>
> >> >> Ed
> >> >>
> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <jun...@gmail.com> wrote:
> >> >> > Currently, as you are iterating messages returned by
> SimpleConsumer,
> >> you
> >> >> > also get the offset for the next message. In the map, you can just
> run
> >> >> for
> >> >> > 30 mins and save the next offset for the next run.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jun
> >> >> >
> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <mypostbo...@gmail.com>
> wrote:
> >> >> >
> >> >> >> Hi ,
> >> >> >>
> >> >> >> I looked at hadoop-consumer , which fetches data directly from the
> >> kafka
> >> >> >> broker . But from what i understand it is based on min and max
> offset
> >> >> and
> >> >> >> map task complete once they reach the maximum offset for a given
> >> topic .
> >> >> >>
> >> >> >> In our use case we would not know about the max offset before
> hand.
> >> >> Instead
> >> >> >> we want map to keep reading data from a min offset and roll over
> >> every
> >> >> 30
> >> >> >> mins . At 30th min we would again generate the offsets which
> would be
> >> >> used
> >> >> >> for the next run.
> >> >> >>
> >> >> >> any suggestions would be helpful .
> >> >> >>
> >> >> >> regards,
> >> >> >> rks
> >> >> >>
> >> >>
> >>
>

Re: Consumer State Description in design.html

Reply via email to