[
https://issues.apache.org/jira/browse/KAFKA-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599108#comment-16599108
]
John Roesler commented on KAFKA-3596:
-------------------------------------
This should be fixed by a few changes coming in 2.1:
* window segments are no longer bounded by number, but by size. This means
that if you were to get events "from the future", they would no longer cause
still-live windows to be dropped
* we have revised the processing and stream-time model:
** We pull records one-at-a-time from the inputs in the order of their
timestamp. This should eliminate the cases where we process an event with a
much-advanced timestamp from queue A, while there are events with smaller
timestamps still to process from queue B. It won't eliminate cases where queue
A alone has out-of-order events.
** Stream time itself is now computed to be the non-decreasing maximum of
observed timestamps.
** Together, these changes mean that "future events" are no longer possible,
only "late events".
> Kafka Streams: Window expiration needs to consider more than event time
> -----------------------------------------------------------------------
>
> Key: KAFKA-3596
> URL: https://issues.apache.org/jira/browse/KAFKA-3596
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 0.10.0.0
> Reporter: Henry Cai
> Priority: Minor
> Labels: architecture
> Fix For: 2.1.0
>
>
> Currently in Kafka Streams, the way the windows are expired in RocksDB is
> triggered by new event insertion. When a window is created at T0 with 10
> minutes retention, when we saw a new record coming with event timestamp T0 +
> 10 +1, we will expire that window (remove it) out of RocksDB.
> In the real world, it's very easy to see event coming with future timestamp
> (or out-of-order events coming with big time gaps between events), this way
> of retiring a window based on one event's event timestamp is dangerous. I
> think at least we need to consider both the event's event time and
> server/stream time elapse.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)