[ 
https://issues.apache.org/jira/browse/KAFKA-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081325#comment-18081325
 ] 

Jahnavi Patel commented on KAFKA-20593:
---------------------------------------

Can I assign this ticket to myself? I'd like to work on it.

> Log a diagnostic warning at startup when Kafka Streams detects internal 
> repartition topics
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20593
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20593
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Ankur Sinha
>            Priority: Minor
>
> *The Problem*
> When a topology changes a record key (e.g., .selectKey(), .map()) and 
> subsequently calls a stateful operation (e.g., .groupByKey(), .join()), Kafka 
> Streams automatically provisions and manages internal repartition topics on 
> the broker cluster.While this is core architectural behavior, these topics 
> are created implicitly. Developers frequently introduce accidental, highly 
> expensive network shuffles without realizing the operational and cloud 
> infrastructure cost impact.Currently, discovering these requires manually 
> printing Topology.describe() or digging through noisy, verbose consumer group 
> rebalance logs long after the application has started. For example, a 
> developer today has to parse runtime blocks like this just to find hidden 
> shuffles:
> {code:java}
> 2026-05-11T16:18:33.861Z  INFO 1 --- [k-streams] [-StreamThread-1] 
> o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer 
> clientId=k-streams-1-consumer, groupId=k-streams_app_id] Updating assignment 
> with
>         Assigned partitions: [Topic1-3, 
> k-streams_app_id-Topic2-repartition-1, k-streams_app_id-Topic2-repartition-5, 
> k-streams_app_id-Topic3-repartition-3...]
> {code}
> This log makes it difficult to separate actual data storage from internal 
> shuffle infrastructure, and it fails to explain which specific operator 
> triggered the repartition.
> *The Solution*
> Add a configuration property that scans the compiled topology during 
> initialization. If Kafka Streams detects that internal repartition topics 
> will be generated on the cluster, it will log a clean, structured WARN block 
> explicitly detailing them before the application begins processing data.New 
> Configuration Property:
> {code:java}
> streams.warn.on.repartition (Boolean, default: true)
> {code}
> Proposed Warning Log Output Example:
> {code:java}
> WARN  org.apache.kafka.streams.KafkaStreams - [Topology Diagnostics] Internal 
> repartition topics detected:
> 1. Topic: k-streams_app_id-Topic1-repartition
>    Trigger Operator: KSTREAM-AGGREGATE-0000000003 (groupByKey)
>    Upstream Cause: Key was flagged as changed by an upstream operator 
> (selectKey)
> 2. Topic: k-streams_app_id-Topic2-repartition
>    Trigger Operator: KSTREAM-JOIN-0000000008 (join)
> {code}
> *Impact & CompatibilityBackward* 
> * Compatible: Yes. It only introduces diagnostic log messages. Existing 
> applications will run completely unchanged.
> * Performance Impact: Negligible. The structural topology scan runs exactly 
> once during the KafkaStreams startup sequence.
> * Opt-out: Users can disable this output by setting 
> streams.warn.on.repartition=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to