AndrewJSchofield commented on code in PR #17454:
URL: https://github.com/apache/kafka/pull/17454#discussion_r1907876806


##########
docs/design.html:
##########
@@ -290,24 +290,58 @@ <h3 class="anchor-heading"><a id="semantics" 
class="anchor-link"></a><a href="#s
     messages have a primary key and so the updates are idempotent (receiving 
the same message twice just overwrites a record with another copy of itself).
     </ol>
     <p>
-    So what about exactly once semantics (i.e. the thing you actually want)? 
When consuming from a Kafka topic and producing to another topic (as in a <a 
href="https://kafka.apache.org/documentation/streams";>Kafka Streams</a>
-    application), we can leverage the new transactional producer capabilities 
in 0.11.0.0 that were mentioned above. The consumer's position is stored as a 
message in a topic, so we can write the offset to Kafka in the
-    same transaction as the output topics receiving the processed data. If the 
transaction is aborted, the consumer's position will revert to its old value 
and the produced data on the output topics will not be visible
-    to other consumers, depending on their "isolation level." In the default 
"read_uncommitted" isolation level, all messages are visible to consumers even 
if they were part of an aborted transaction,
-    but in "read_committed," the consumer will only return messages from 
transactions which were committed (and any messages which were not part of a 
transaction).
+    So what about exactly-once semantics? When consuming from a Kafka topic 
and producing to another topic (as in a <a 
href="https://kafka.apache.org/documentation/streams";>Kafka Streams</a> 
application), we can
+    leverage the new transactional producer capabilities in 0.11.0.0 that were 
mentioned above. The consumer's position is stored as a message in an internal 
topic, so we can write the offset to Kafka in the
+    same transaction as the output topics receiving the processed data. If the 
transaction is aborted, the consumer's stored position will revert to its old 
value (although the consumer has to refetch the
+    committed offset because it does not automatically rewind) and the 
produced data on the output topics will not be visible to other consumers, 
depending on their "isolation level". In the default
+    "read_uncommitted" isolation level, all messages are visible to consumers 
even if they were part of an aborted transaction, but in "read_committed" 
isolation level, the consumer will only return messages
+    from transactions which were committed (and any messages which were not 
part of a transaction).
     <p>
     When writing to an external system, the limitation is in the need to 
coordinate the consumer's position with what is actually stored as output. The 
classic way of achieving this would be to introduce a two-phase
-    commit between the storage of the consumer position and the storage of the 
consumers output. But this can be handled more simply and generally by letting 
the consumer store its offset in the same place as
+    commit between the storage of the consumer position and the storage of the 
consumers output. This can be handled more simply and generally by letting the 
consumer store its offset in the same place as
     its output. This is better because many of the output systems a consumer 
might want to write to will not support a two-phase commit. As an example of 
this, consider a
     <a href="https://kafka.apache.org/documentation/#connect";>Kafka 
Connect</a> connector which populates data in HDFS along with the offsets of 
the data it reads so that it is guaranteed that either data and
     offsets are both updated or neither is. We follow similar patterns for 
many other data systems which require these stronger semantics and for which 
the messages do not have a primary key to allow for deduplication.
     <p>
-    So effectively Kafka supports exactly-once delivery in <a 
href="https://kafka.apache.org/documentation/streams";>Kafka Streams</a>, and 
the transactional producer/consumer can be used generally to provide
+    As a result, Kafka supports exactly-once delivery in <a 
href="https://kafka.apache.org/documentation/streams";>Kafka Streams</a>, and 
the transactional producer/consumer can be used generally to provide
     exactly-once delivery when transferring and processing data between Kafka 
topics. Exactly-once delivery for other destination systems generally requires 
cooperation with such systems, but Kafka provides the

Review Comment:
   I've reworded that part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to