[
https://issues.apache.org/jira/browse/KAFKA-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081325#comment-18081325
]
Jahnavi Patel commented on KAFKA-20593:
---------------------------------------
Can I assign this ticket to myself? I'd like to work on it.
> Log a diagnostic warning at startup when Kafka Streams detects internal
> repartition topics
> ------------------------------------------------------------------------------------------
>
> Key: KAFKA-20593
> URL: https://issues.apache.org/jira/browse/KAFKA-20593
> Project: Kafka
> Issue Type: New Feature
> Components: streams
> Reporter: Ankur Sinha
> Priority: Minor
>
> *The Problem*
> When a topology changes a record key (e.g., .selectKey(), .map()) and
> subsequently calls a stateful operation (e.g., .groupByKey(), .join()), Kafka
> Streams automatically provisions and manages internal repartition topics on
> the broker cluster.While this is core architectural behavior, these topics
> are created implicitly. Developers frequently introduce accidental, highly
> expensive network shuffles without realizing the operational and cloud
> infrastructure cost impact.Currently, discovering these requires manually
> printing Topology.describe() or digging through noisy, verbose consumer group
> rebalance logs long after the application has started. For example, a
> developer today has to parse runtime blocks like this just to find hidden
> shuffles:
> {code:java}
> 2026-05-11T16:18:33.861Z INFO 1 --- [k-streams] [-StreamThread-1]
> o.a.k.c.c.internals.ConsumerCoordinator : [Consumer
> clientId=k-streams-1-consumer, groupId=k-streams_app_id] Updating assignment
> with
> Assigned partitions: [Topic1-3,
> k-streams_app_id-Topic2-repartition-1, k-streams_app_id-Topic2-repartition-5,
> k-streams_app_id-Topic3-repartition-3...]
> {code}
> This log makes it difficult to separate actual data storage from internal
> shuffle infrastructure, and it fails to explain which specific operator
> triggered the repartition.
> *The Solution*
> Add a configuration property that scans the compiled topology during
> initialization. If Kafka Streams detects that internal repartition topics
> will be generated on the cluster, it will log a clean, structured WARN block
> explicitly detailing them before the application begins processing data.New
> Configuration Property:
> {code:java}
> streams.warn.on.repartition (Boolean, default: true)
> {code}
> Proposed Warning Log Output Example:
> {code:java}
> WARN org.apache.kafka.streams.KafkaStreams - [Topology Diagnostics] Internal
> repartition topics detected:
> 1. Topic: k-streams_app_id-Topic1-repartition
> Trigger Operator: KSTREAM-AGGREGATE-0000000003 (groupByKey)
> Upstream Cause: Key was flagged as changed by an upstream operator
> (selectKey)
> 2. Topic: k-streams_app_id-Topic2-repartition
> Trigger Operator: KSTREAM-JOIN-0000000008 (join)
> {code}
> *Impact & CompatibilityBackward*
> * Compatible: Yes. It only introduces diagnostic log messages. Existing
> applications will run completely unchanged.
> * Performance Impact: Negligible. The structural topology scan runs exactly
> once during the KafkaStreams startup sequence.
> * Opt-out: Users can disable this output by setting
> streams.warn.on.repartition=false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)