[
https://issues.apache.org/jira/browse/KAFKA-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081627#comment-18081627
]
Ankur Sinha commented on KAFKA-20593:
-------------------------------------
Thanks for the feedback, [~mjsax].
I agree that introducing a dedicated config for this is probably unnecessary,
especially if the behavior is only logging.
The main motivation behind the proposal was less about topic creation itself
being problematic, and more about improving visibility into implicit
repartitioning steps during develpment and deployment. While
`Topology.describe()` does expose the information, in practice many users
(especially newer Kafka Streams users) do not inspect the topology unless
debugging performance or operational issues later on.
Repartition topics are often only discovered indirectly through unexpected
internal topics and additional latency
The intent was to make repartiioning behavior more visible proactively. That
said, I undertand the concern that WARN level may be too strong since
repartitioning is expected behavior and not inherently probleatic. INFO-level
may be a more appropriate direction.
I’ll revisit the proposal with a smaller scope and think more about whether
there is a clearer operational gap beyond what `Topology.describe()` already
provides.
> Log a diagnostic warning at startup when Kafka Streams detects internal
> repartition topics
> ------------------------------------------------------------------------------------------
>
> Key: KAFKA-20593
> URL: https://issues.apache.org/jira/browse/KAFKA-20593
> Project: Kafka
> Issue Type: New Feature
> Components: streams
> Reporter: Ankur Sinha
> Priority: Minor
>
> *The Problem*
> When a topology changes a record key (e.g., .selectKey(), .map()) and
> subsequently calls a stateful operation (e.g., .groupByKey(), .join()), Kafka
> Streams automatically provisions and manages internal repartition topics on
> the broker cluster.While this is core architectural behavior, these topics
> are created implicitly. Developers frequently introduce accidental, highly
> expensive network shuffles without realizing the operational and cloud
> infrastructure cost impact.Currently, discovering these requires manually
> printing Topology.describe() or digging through noisy, verbose consumer group
> rebalance logs long after the application has started. For example, a
> developer today has to parse runtime blocks like this just to find hidden
> shuffles:
> {code:java}
> 2026-05-11T16:18:33.861Z INFO 1 --- [k-streams] [-StreamThread-1]
> o.a.k.c.c.internals.ConsumerCoordinator : [Consumer
> clientId=k-streams-1-consumer, groupId=k-streams_app_id] Updating assignment
> with
> Assigned partitions: [Topic1-3,
> k-streams_app_id-Topic2-repartition-1, k-streams_app_id-Topic2-repartition-5,
> k-streams_app_id-Topic3-repartition-3...]
> {code}
> This log makes it difficult to separate actual data storage from internal
> shuffle infrastructure, and it fails to explain which specific operator
> triggered the repartition.
> *The Solution*
> Add a configuration property that scans the compiled topology during
> initialization. If Kafka Streams detects that internal repartition topics
> will be generated on the cluster, it will log a clean, structured WARN block
> explicitly detailing them before the application begins processing data.New
> Configuration Property:
> {code:java}
> streams.warn.on.repartition (Boolean, default: true)
> {code}
> Proposed Warning Log Output Example:
> {code:java}
> WARN org.apache.kafka.streams.KafkaStreams - [Topology Diagnostics] Internal
> repartition topics detected:
> 1. Topic: k-streams_app_id-Topic1-repartition
> Trigger Operator: KSTREAM-AGGREGATE-0000000003 (groupByKey)
> Upstream Cause: Key was flagged as changed by an upstream operator
> (selectKey)
> 2. Topic: k-streams_app_id-Topic2-repartition
> Trigger Operator: KSTREAM-JOIN-0000000008 (join)
> {code}
> *Impact & CompatibilityBackward*
> * Compatible: Yes. It only introduces diagnostic log messages. Existing
> applications will run completely unchanged.
> * Performance Impact: Negligible. The structural topology scan runs exactly
> once during the KafkaStreams startup sequence.
> * Opt-out: Users can disable this output by setting
> streams.warn.on.repartition=false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)