Sören Henning created KAFKA-9647:
------------------------------------
Summary: Add ability to suppress until window end (not close)
Key: KAFKA-9647
URL: https://issues.apache.org/jira/browse/KAFKA-9647
Project: Kafka
Issue Type: Wish
Components: streams
Reporter: Sören Henning
*Preface:* This feature request originates from a [recently asked question on
Stack
Overflow|https://stackoverflow.com/questions/60005630/kafka-streams-suppress-until-window-end-not-close],
for which Matthias J. Sax suggested to create a feature request.
*Feature Request:* In addition to suppressing updates to a windowed KTable
until a window closes, we suggest to only suppress "early" results. By early
results we mean results computed before the window ends, but not those results
occurring during the grace period. Thus, this suppress option would suppress
all aggregation results with timestamp < window end, but forward all records
with timestamp >= window end and timestamp < window close.
*Use Case:* For an exemplary use case, we refer to John Roesler's [blog post on
the initial introduction of the suppress
operator|https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers/].
The post argues that for the case of altering not every intermediate
aggregation result should trigger an alert message, but only the "final"
result. Otherwise, a "follow-up email telling people to ignore the first
message" might become required if the final results would not cause an alert
but intermediate results would. Kafka Streams' current solution for this use
case would be to use a suppress operation, which would only forward the final
result, which would be the last result before no further updates could occur.
This is when the grace period of a window passed (the window closes).
However, ideally we would like to set the grace period a large as possible to
allow for very late-arriving messages, which in turn would lead to very late
alerts. On the other hand, such late-arriving messages are rare in practice and
normally the order of events corresponds largely to the order of messages.
Thus, a reasonable option would be to suppress aggregation results only until
the window ends (i.e. stream time > window end) and then forward this "most
likely final" result. For the use case of altering, this means an alert is
triggered when we are relatively certain that recorded data requires an alert.
Then, only the "seldom" case of late updates which would change our decision
would require the "follow-up email telling people to ignore the first message".
Such rare "correction" should be acceptable for many use cases.
*Further extension:* In addition to suppressing all updates until the window
ends and afterwards forwarding all updates, a further extension would be to
only forward late records every x seconds. Maybe the existing
`Suppressed.untilTimeLimit( .. )` could be reused for this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)