A. Sophie Blee-Goldman created KAFKA-12710:
----------------------------------------------
Summary: Consider enabling (at least some) optimizations by default
Key: KAFKA-12710
URL: https://issues.apache.org/jira/browse/KAFKA-12710
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: A. Sophie Blee-Goldman
Topology optimizations such as the repartition consolidation and source topic
changelog are extremely useful at reducing the footprint of a Kafka Streams
application on the broker. The additional storage and resource utilization due
to changelogs and repartitions is a very real pain point, and has even been
cited as the reason for turning to other stream processing frameworks in the
past (though of course I question that judgement)
The repartition topic optimization, at the very least, should be enabled by
default. The problem is that we can't just flip the switch without breaking
existing applications during upgrade, since the location and name of such
topics in the topology may change. One possibility is to just detect this
situation and disable the optimization if we find that it would produce an
incompatible topology for an existing application. We can determine that this
is the case simply by looking for pre-existing repartition topics. If any such
topics are present, and match the set of repartition topics in the un-optimized
topology, then we know we need to switch the optimization off. If we don't find
any repartition topics, or they match the optimized topology, then we're safe
to enable it by default.
Alternatively, we could just do a KIP to indicate that we intend to change the
default in the next breaking release and that existing applications should
override this config if necessary. We should be able to implement a fail-safe
and shut down if a user misses or forgets to do so, using the method mentioned
above.
The source topic optimization is perhaps more controversial, as there have been
a few issues raised with regards to things like [restoring bad data and
asymmetric serdes|https://issues.apache.org/jira/browse/KAFKA-8037], or more
recently the bug discovered in the [emit-on-change semantics for
KTables|https://issues.apache.org/jira/browse/KAFKA-12508?focusedCommentId=17306323&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17306323].
However for this case at least there are no compatibility concerns. It's safe
to upgrade from using a separate changelog for a source KTable to just using
that source topic directly, although the reverse is not true. We could even
automatically delete the no-longer-necessary changelog for upgrading
applications
--
This message was sent by Atlassian Jira
(v8.3.4#803005)