[kafka] branch trunk updated: KAFKA-3514, Documentations: Add out of ordering in concepts. (#5458)

guozhang Tue, 11 Sep 2018 16:29:16 -0700

This is an automated email from the ASF dual-hosted git repository.

guozhang pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/kafka.git



The following commit(s) were added to refs/heads/trunk by this push:
     new 68b2f49  KAFKA-3514, Documentations: Add out of ordering in concepts. 
(#5458)
68b2f49 is described below

commit 68b2f49ea75059df5527378e8ae771195029c98a
Author: Guozhang Wang <[email protected]>
AuthorDate: Tue Sep 11 16:28:52 2018 -0700

    KAFKA-3514, Documentations: Add out of ordering in concepts. (#5458)
    
    Reviewers: Matthias J. Sax <[email protected]>, John Roesler 
<[email protected]>, Bill Bejeck <[email protected]>
---
 docs/streams/core-concepts.html | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/docs/streams/core-concepts.html b/docs/streams/core-concepts.html
index 3f9eab5..015fbb4 100644
--- a/docs/streams/core-concepts.html
+++ b/docs/streams/core-concepts.html
@@ -172,6 +172,37 @@
         More details can be found in the <a 
href="/{{version}}/documentation#streamsconfigs"><b>Kafka Streams 
Configs</b></a> section.
     </p>
 
+    <h3><a id="streams_out_of_ordering" 
href="#streams_out_of_ordering">Out-of-Order Handling</a></h3>
+
+    <p>
+        Besides the guarantee that each record will be processed exactly-once, 
another issue that many stream processing application will face is how to
+        handle <a href="tbd">out-of-order data</a> that may impact their 
business logic. In Kafka Streams, there are two causes that could potentially
+        result in out-of-order data arrivals with respect to their timestamps:
+    </p>
+
+    <ul>
+        <li> Within a topic-partition, a record's timestamp may not be 
monotonically increasing along with their offsets. Since Kafka Streams will 
always try to process records within a topic-partition to follow the offset 
order,
+            it can cause records with larger timestamps (but smaller offsets) 
to be processed earlier than records with smaller timestamps (but larger 
offsets) in the same topic-partition.
+        </li>
+        <li> Within a <a 
href="/{{version}}/documentation/streams/architecture#streams_architecture_tasks">stream
 task</a> that may be processing multiple topic-partitions, if users configure 
the application to not wait for all partitions to contain some buffered data and
+             pick from the partition with the smallest timestamp to process 
the next record, then later on when some records are fetched for other 
topic-partitions, their timestamps may be smaller than those processed records 
fetched from another topic-partition.
+        </li>
+    </ul>
+
+    <p>
+        For stateless operations, out-of-order data will not impact processing 
logic since only one record is considered at a time, without looking into the 
history of past processed records;
+        for stateful operations such as aggregations and joins, however, 
out-of-order data could cause the processing logic to be incorrect. If users 
want to handle such out-of-order data, generally they need to allow their 
applications
+        to wait for longer time while bookkeeping their states during the wait 
time, i.e. making trade-off decisions between latency, cost, and correctness.
+        In Kafka Streams specifically, users can configure their window 
operators for windowed aggregations to achieve such trade-offs (details can be 
found in <a 
href="/{{version}}/documentation/streams/developer-guide"><b>Developer 
Guide</b></a>).
+        As for Joins, users have to be aware that some of the out-of-order 
data cannot be handled by increasing on latency and cost in Streams yet:
+    </p>
+
+    <ul>
+        <li> For Stream-Stream joins, all three types (inner, outer, left) 
handle out-of-order records correctly, but the resulted stream may contain 
unnecessary leftRecord-null for left joins, and leftRecord-null or 
null-rightRecord for outer joins. </li>
+        <li> For Stream-Table joins, out-of-order records are not handled 
(i.e., Streams applications don't check for out-of-order records and just 
process all records in offset order), and hence it may produce unpredictable 
results. </li>
+        <li> For Table-Table joins, out-of-order records are not handled 
(i.e., Streams applications don't check for out-of-order records and just 
process all records in offset order). However, the join result is a changelog 
stream and hence will be eventually consistent. </li>
+    </ul>
+
     <div class="pagination">
         <a href="/{{version}}/documentation/streams/tutorial" 
class="pagination__btn pagination__btn__prev">Previous</a>
         <a href="/{{version}}/documentation/streams/architecture" 
class="pagination__btn pagination__btn__next">Next</a>

[kafka] branch trunk updated: KAFKA-3514, Documentations: Add out of ordering in concepts. (#5458)

Reply via email to