Ankur Sinha created KAFKA-20593:
-----------------------------------

             Summary: Log a diagnostic warning at startup when Kafka Streams 
detects internal repartition topics
                 Key: KAFKA-20593
                 URL: https://issues.apache.org/jira/browse/KAFKA-20593
             Project: Kafka
          Issue Type: New Feature
          Components: streams
            Reporter: Ankur Sinha


*The Problem*
When a topology changes a record key (e.g., .selectKey(), .map()) and 
subsequently calls a stateful operation (e.g., .groupByKey(), .join()), Kafka 
Streams automatically provisions and manages internal repartition topics on the 
broker cluster.While this is core architectural behavior, these topics are 
created implicitly. Developers frequently introduce accidental, highly 
expensive network shuffles without realizing the operational and cloud 
infrastructure cost impact.Currently, discovering these requires manually 
printing Topology.describe() or digging through noisy, verbose consumer group 
rebalance logs long after the application has started. For example, a developer 
today has to parse runtime blocks like this just to find hidden shuffles:

{code:java}
2026-05-11T16:18:33.861Z  INFO 1 --- [k-streams] [-StreamThread-1] 
o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer 
clientId=k-streams-1-consumer, groupId=k-streams_app_id] Updating assignment 
with
        Assigned partitions: [Topic1-3, k-streams_app_id-Topic2-repartition-1, 
k-streams_app_id-Topic2-repartition-5, k-streams_app_id-Topic3-repartition-3...]
{code}

This log makes it difficult to separate actual data storage from internal 
shuffle infrastructure, and it fails to explain which specific operator 
triggered the repartition.

*The Solution*
Add a configuration property that scans the compiled topology during 
initialization. If Kafka Streams detects that internal repartition topics will 
be generated on the cluster, it will log a clean, structured WARN block 
explicitly detailing them before the application begins processing data.New 
Configuration Property:


{code:java}
streams.warn.on.repartition (Boolean, default: true)
{code}


Proposed Warning Log Output Example:


{code:java}
WARN  org.apache.kafka.streams.KafkaStreams - [Topology Diagnostics] Internal 
repartition topics detected:

1. Topic: k-streams_app_id-Topic1-repartition
   Trigger Operator: KSTREAM-AGGREGATE-0000000003 (groupByKey)
   Upstream Cause: Key was flagged as changed by an upstream operator 
(selectKey)

2. Topic: k-streams_app_id-Topic2-repartition
   Trigger Operator: KSTREAM-JOIN-0000000008 (join)
{code}

*Impact & CompatibilityBackward* 
* Compatible: Yes. It only introduces diagnostic log messages. Existing 
applications will run completely unchanged.
* Performance Impact: Negligible. The structural topology scan runs exactly 
once during the KafkaStreams startup sequence.
* Opt-out: Users can disable this output by setting 
streams.warn.on.repartition=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to