[
https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926580#comment-17926580
]
Steven Schlansker commented on KAFKA-8770:
------------------------------------------
Hi [~vvcephei] , thank you for your work on emit-on-change semantics.
We just built a nontrivial Kafka Streams app, under the impression that the
duplicate updates would be suppressed, which is important for our use case
otherwise we do a massive amount of redundant processing under catch-up
conditions.
In testing, we then realized the saga of KIP-557 - last time we used Kafka
Streams, this important optimization was available, and we did not re-check and
notice it has been reverted, and now we are stuck with an app that is way
slower than we anticipated.
Is there any way we could help push a new version of this optimization forward?
Seems that it got reverted years ago and not much activity since.
> Either switch to or add an option for emit-on-change
> ----------------------------------------------------
>
> Key: KAFKA-8770
> URL: https://issues.apache.org/jira/browse/KAFKA-8770
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: John Roesler
> Priority: Major
> Labels: needs-kip
>
> Currently, Streams offers two emission models:
> * emit-on-window-close: (using Suppression)
> * emit-on-update: (i.e., emit a new result whenever a new record is
> processed, regardless of whether the result has changed)
> There is also an option to drop some intermediate results, either using
> caching or suppression.
> However, there is no support for emit-on-change, in which results would be
> forwarded only if the result has changed. This has been reported to be
> extremely valuable as a performance optimizations for some high-traffic
> applications, and it reduces the computational burden both internally for
> downstream Streams operations, as well as for external systems that consume
> the results, and currently have to deal with a lot of "no-op" changes.
> It would be pretty straightforward to implement this, by loading the prior
> results before a stateful operation and comparing with the new result before
> persisting or forwarding. In many cases, we load the prior result anyway, so
> it may not be a significant performance impact either.
> One design challenge is what to do with timestamps. If we get one record at
> time 1 that produces a result, and then another at time 2 that produces a
> no-op, what should be the timestamp of the result, 1 or 2? emit-on-change
> would require us to say 1.
> Clearly, we'd need to do some serious benchmarks to evaluate any potential
> implementation of emit-on-change.
> Another design challenge is to decide if we should just automatically provide
> emit-on-change for stateful operators, or if it should be configurable.
> Configuration increases complexity, so unless the performance impact is high,
> we may just want to change the emission model without a configuration.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)