Repository: kafka
Updated Branches:
  refs/heads/0.11.0 57181bb77 -> 028b939da


KAFKA-5290; Docs need clarification on meaning of 'committed' to the log

based on conversations with vahidhashemian rajinisivaram apurvam

The docs didn't make clear that what gets committed and what gets not may 
depend on the producer acks.

Author: Edoardo Comar <[email protected]>

Reviewers: Vahid Hashemian <[email protected]>, Apurva Mehta 
<[email protected]>, Jason Gustafson <[email protected]>

Closes #3035 from edoardocomar/DOC-clarification-on-committed

(cherry picked from commit 491774bd52bfe34f32f935bc4fc5db22dc8ceba2)
Signed-off-by: Jason Gustafson <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/028b939d
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/028b939d
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/028b939d

Branch: refs/heads/0.11.0
Commit: 028b939daa30f57894a86480dbeaac066a2b1bc1
Parents: 57181bb
Author: Edoardo Comar <[email protected]>
Authored: Tue Jun 20 14:16:29 2017 -0700
Committer: Jason Gustafson <[email protected]>
Committed: Tue Jun 20 14:22:33 2017 -0700

----------------------------------------------------------------------
 docs/design.html | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/028b939d/docs/design.html
----------------------------------------------------------------------
diff --git a/docs/design.html b/docs/design.html
index 0373a2d..8814c57 100644
--- a/docs/design.html
+++ b/docs/design.html
@@ -238,9 +238,9 @@
     can fail, cases where there are multiple consumer processes, or cases 
where data written to disk can be lost).
     <p>
     Kafka's semantics are straight-forward. When publishing a message we have 
a notion of the message being "committed" to the log. Once a published message 
is committed it will not be lost as long as one broker that
-    replicates the partition to which this message was written remains 
"alive". The definition of alive as well as a description of which types of 
failures we attempt to handle will be described in more detail in the
-    next section. For now let's assume a perfect, lossless broker and try to 
understand the guarantees to the producer and consumer. If a producer attempts 
to publish a message and experiences a network error it cannot
-    be sure if this error happened before or after the message was committed. 
This is similar to the semantics of inserting into a database table with an 
autogenerated key.
+    replicates the partition to which this message was written remains 
"alive". The definition of committed message, alive partition as well as a 
description of which types of failures we attempt to handle will be 
+    described in more detail in the next section. For now let's assume a 
perfect, lossless broker and try to understand the guarantees to the producer 
and consumer. If a producer attempts to publish a message and 
+    experiences a network error it cannot be sure if this error happened 
before or after the message was committed. This is similar to the semantics of 
inserting into a database table with an autogenerated key.
     <p>
     These are not the strongest possible semantics for publishers. Although we 
cannot be sure of what happened in the case of a network error, it is possible 
to allow the producer to generate a sort of "primary key" that
     makes retrying the produce request idempotent. This feature is not trivial 
for a replicated system because of course it must work even (or especially) in 
the case of a server failure. With this feature it would
@@ -298,9 +298,13 @@
     In distributed systems terminology we only attempt to handle a 
"fail/recover" model of failures where nodes suddenly cease working and then 
later recover (perhaps without knowing that they have died). Kafka does not
     handle so-called "Byzantine" failures in which nodes produce arbitrary or 
malicious responses (perhaps due to bugs or foul play).
     <p>
-    A message is considered "committed" when all in sync replicas for that 
partition have applied it to their log. Only committed messages are ever given 
out to the consumer. This means that the consumer need not worry
-    about potentially seeing a message that could be lost if the leader fails. 
Producers, on the other hand, have the option of either waiting for the message 
to be committed or not, depending on their preference for
-    tradeoff between latency and durability. This preference is controlled by 
the acks setting that the producer uses.
+    We can now more precisely define that a message is considered committed 
when all in sync replicas for that partition have applied it to their log.
+    Only committed messages are ever given out to the consumer. This means 
that the consumer need not worry about potentially seeing a message that could 
be lost if the leader fails. Producers, on the other hand, 
+    have the option of either waiting for the message to be committed or not, 
depending on their preference for tradeoff between latency and durability. This 
preference is controlled by the acks setting that the 
+    producer uses.
+    Note that topics have a setting for the "minimum number" of in-sync 
replicas that is checked when the producer requests acknowledgment that a 
message
+    has been written to the full set of in-sync replicas. If a less stringent 
acknowledgement is requested by the producer, then the message can be 
committed, and consumed, 
+    even if the number of in-sync replicas is lower than the minimum (e.g. it 
can be as low as just the leader).
     <p>
     The guarantee that Kafka offers is that a committed message will not be 
lost, as long as there is at least one in sync replica alive, at all times.
     <p>

Reply via email to