Producer deliverability guarantees and KAFKA-49

Bosko Milekic Sat, 03 Dec 2011 22:29:37 -0800

I've started looking at Kafka as a possible replacement for flume for
moving event data into hadoop, on-the-fly stream processing, and for
making some of our event data available to external parties via
pub/sub.


The two main reasons we're looking for alternatives to flume:

- While trying to avoid hitting the disk, there's a problem when you
move events downstream when the downstream (consumer) is unable to
keep up and empty his socket receive buffers fast enough... in the
case of flume the problem currently renders the flow to a halt. This
manifests itself in the form of an upstream flume node's socket send
buffer filling up and blocking the running thread.  This is a bummer
as even "fan-out" nodes in flume are currently single threaded, so
fanning out can completely stop if one of the downstream nodes gets
bogged down [apparently this is addressed in the flume-ng ongoing work
(I'm told that fanout node is multi-threaded there)] -- I was pleased
to see that this drawback was specifically addressed and avoided by
Kafka's design (brokers always buffer and consumers pull as they see
fit);

- Flume guarantees at-least-once event delivery in E2E and DFO modes,
and at-most-once in BE mode. This means de-dupping is required if you
want complete data. Wondering if de-dupping could be avoided with
exactly-once delivery, and it seems to me, *at least for broker ->
consumer communication*, that exactly-once is what you get with Kafka
(actually, it's not that Kafka provides this explicitly, it's that one
can build consumers that do).

However, once I started looking at the producer api, I ran into a
question that my (so far) brief perusing of the source has not yet
answered - maybe you can help:

The kafka producer API doesn't appear to provide explicit guarantees
to successful reception of messages at the broker end. Both
http://readthedocs.org/docs/brod/en/latest/spec.html and
https://issues.apache.org/jira/browse/KAFKA-49 confirm this, and upon
reading the solution being worked on in KAFKA-49, I'm wondering
whether the simple protocol being implemented will in fact guarantee
exactly-once delivery.  Is this the goal?  If the acknowledgment from
the broker gets lost (e.g., producer sends a successful message to
broker, broker sends ack, producer goes down before receiving ack and
bookkeeping it), then it seems to me that the producer would end up
with slightly stale state and thus re-transmit the last message that
he did not manage to ack. This is fine, but it's at-least-once
guarantee, not exactly-once.

Is enabling exactly-once message delivery a goal for Kafka, and if so,
what am I missing regarding the producer protocol being developed in
KAFKA-49 and its intended/suggested usage? How do existing kafka
setups (at LInkedIn or other) deal with a producer not being able to
transmit to a broker for some transient time, without generating
duplicates?

Apologies if this was better suited for kafka-dev, feel free to
forward there if that is the case.

Best,
b

Producer deliverability guarantees and KAFKA-49

Reply via email to