Mark,

>> Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? What
are the differences?

The high level consumer check points the offsets in zookeeper, either
periodically or based on an API call (look at commitOffsets()).

If you want to checkpoint each and every message offset, exactly-once
semantics will be expensive. But if you are willing to tolerate a small
window of duplicates, you could buffer and write the offsets in batches.
If you choose to do the former, commitOffsets() approach is expensive,
since that can lead to too many writes on zookeeper. If you choose the
later, it could be fine, and you can use the high level consumer itself.

On the contrary, if your consumer is writing the messages to some database
or persistent storage, you might be better off using SimpleConsumer. There
was another discussion about making the offset storage of the high level
consumer pluggable, but we don't have that feature yet.

Thanks,
Neha


On Thu, Dec 8, 2011 at 9:32 AM, Jun Rao <jun...@gmail.com> wrote:

> Currently, the high level consumer (with ZK integration) doesn't expose
> offsets to the consumer. Only SimpleConsumer does.
>
> Jun
>
> On Thu, Dec 8, 2011 at 9:15 AM, Mark <static.void....@gmail.com> wrote:
>
> > "This is only possible through SimpleConsumer right now."
> >
> >
> > Is that correct? Did you mean SimpleConsumer or HighLevelConsumer? What
> > are the differences?
> >
> >
> > On 12/8/11 8:53 AM, Jun Rao wrote:
> >
> >> Mark,
> >>
> >> Today, this is mostly the responsibility of the consumer, by managing
> the
> >> offsets properly. For example, if the consumer periodically flushes
> >> messages to disk, it has to checkpoint to disk the offset corresponding
> to
> >> the last flush. On failure, the consumer has to rewind the consumption
> >> from
> >> the last checkpointed offset. This is only possible through
> SimpleConsumer
> >> right now.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Thu, Dec 8, 2011 at 8:18 AM, Mark<static.void....@gmail.com**>
>  wrote:
> >>
> >>  How can one guarantee exactly one semantics when using Kafka as a
> >>> traditional queue? Is this guarantee the responsibility of the
> consumer?
> >>>
> >>>
>

Reply via email to