Matthias J. Sax created KAFKA-20099:
---------------------------------------

             Summary: Clarify how "stream-time" works in the docs
                 Key: KAFKA-20099
                 URL: https://issues.apache.org/jira/browse/KAFKA-20099
             Project: Kafka
          Issue Type: Improvement
          Components: docs, streams
            Reporter: Matthias J. Sax


In Kafka Streams, we have the high-level concept of "stream-time" which we 
briefly explain in the docs. However, we do not fully cover the details. Given 
other external material on the topic (blog post; talks), the high level concept 
of "stream-time" is well established and known by users.

However, the implementation details matter and we have seen user 
mis-understanding how it work in particular. We should update the docs 
accordingly.

The most important notion is, that KS runtime implements "stream-time" on a per 
task level based in input records for the task (we can call this "task time"). 
This implies that there is no global notion of "stream time", and each task 
tacks it's own "stream time" independently. Because "task time" is a runtime 
concept, KS also preserves "task time" across rebalances and application 
restarts.

In contrast, some DSL operators (which use a grace period) implement their own 
"stream time" tracking (we can call this "operator time"). An individual 
Processor might not get all record from the task input topics (upstream 
operators might filter, or cache records), and most importantly, operators 
track their internal "stream time" only in-memory, and thus it's not preserved 
across rebalances / restarts. "Operator time" would be re-established based on 
the first input record after a rebalance / restart. – This is an important 
difference to "task time" as it impact how grace-periods are applied and should 
be documented.

Cf https://issues.apache.org/jira/browse/KAFKA-9368 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to