[ 
https://issues.apache.org/jira/browse/KAFKA-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081627#comment-18081627
 ] 

Ankur Sinha commented on KAFKA-20593:
-------------------------------------

Thanks for the feedback, [~mjsax].

I agree that introducing a dedicated config for this is probably unnecessary, 
especially if the behavior is only logging.

The main motivation behind the proposal was less about topic creation itself 
being problematic, and more about improving visibility into implicit 
repartitioning steps  during develpment and deployment. While 
`Topology.describe()` does expose the information, in practice many users 
(especially newer Kafka Streams users) do not inspect the topology unless 
debugging performance or operational issues later on. 
Repartition topics are often only discovered indirectly through unexpected 
internal topics and additional latency

The intent was to make repartiioning behavior more visible proactively. That 
said, I undertand the concern that WARN level may be too strong since 
repartitioning is expected behavior and not inherently probleatic. INFO-level 
may be a more appropriate direction.

I’ll revisit the proposal with a smaller scope and think more about whether 
there is a clearer operational gap beyond what `Topology.describe()` already 
provides.


> Log a diagnostic warning at startup when Kafka Streams detects internal 
> repartition topics
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20593
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20593
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Ankur Sinha
>            Priority: Minor
>
> *The Problem*
> When a topology changes a record key (e.g., .selectKey(), .map()) and 
> subsequently calls a stateful operation (e.g., .groupByKey(), .join()), Kafka 
> Streams automatically provisions and manages internal repartition topics on 
> the broker cluster.While this is core architectural behavior, these topics 
> are created implicitly. Developers frequently introduce accidental, highly 
> expensive network shuffles without realizing the operational and cloud 
> infrastructure cost impact.Currently, discovering these requires manually 
> printing Topology.describe() or digging through noisy, verbose consumer group 
> rebalance logs long after the application has started. For example, a 
> developer today has to parse runtime blocks like this just to find hidden 
> shuffles:
> {code:java}
> 2026-05-11T16:18:33.861Z  INFO 1 --- [k-streams] [-StreamThread-1] 
> o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer 
> clientId=k-streams-1-consumer, groupId=k-streams_app_id] Updating assignment 
> with
>         Assigned partitions: [Topic1-3, 
> k-streams_app_id-Topic2-repartition-1, k-streams_app_id-Topic2-repartition-5, 
> k-streams_app_id-Topic3-repartition-3...]
> {code}
> This log makes it difficult to separate actual data storage from internal 
> shuffle infrastructure, and it fails to explain which specific operator 
> triggered the repartition.
> *The Solution*
> Add a configuration property that scans the compiled topology during 
> initialization. If Kafka Streams detects that internal repartition topics 
> will be generated on the cluster, it will log a clean, structured WARN block 
> explicitly detailing them before the application begins processing data.New 
> Configuration Property:
> {code:java}
> streams.warn.on.repartition (Boolean, default: true)
> {code}
> Proposed Warning Log Output Example:
> {code:java}
> WARN  org.apache.kafka.streams.KafkaStreams - [Topology Diagnostics] Internal 
> repartition topics detected:
> 1. Topic: k-streams_app_id-Topic1-repartition
>    Trigger Operator: KSTREAM-AGGREGATE-0000000003 (groupByKey)
>    Upstream Cause: Key was flagged as changed by an upstream operator 
> (selectKey)
> 2. Topic: k-streams_app_id-Topic2-repartition
>    Trigger Operator: KSTREAM-JOIN-0000000008 (join)
> {code}
> *Impact & CompatibilityBackward* 
> * Compatible: Yes. It only introduces diagnostic log messages. Existing 
> applications will run completely unchanged.
> * Performance Impact: Negligible. The structural topology scan runs exactly 
> once during the KafkaStreams startup sequence.
> * Opt-out: Users can disable this output by setting 
> streams.warn.on.repartition=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to