Ankur Sinha created KAFKA-20593:
-----------------------------------
Summary: Log a diagnostic warning at startup when Kafka Streams
detects internal repartition topics
Key: KAFKA-20593
URL: https://issues.apache.org/jira/browse/KAFKA-20593
Project: Kafka
Issue Type: New Feature
Components: streams
Reporter: Ankur Sinha
*The Problem*
When a topology changes a record key (e.g., .selectKey(), .map()) and
subsequently calls a stateful operation (e.g., .groupByKey(), .join()), Kafka
Streams automatically provisions and manages internal repartition topics on the
broker cluster.While this is core architectural behavior, these topics are
created implicitly. Developers frequently introduce accidental, highly
expensive network shuffles without realizing the operational and cloud
infrastructure cost impact.Currently, discovering these requires manually
printing Topology.describe() or digging through noisy, verbose consumer group
rebalance logs long after the application has started. For example, a developer
today has to parse runtime blocks like this just to find hidden shuffles:
{code:java}
2026-05-11T16:18:33.861Z INFO 1 --- [k-streams] [-StreamThread-1]
o.a.k.c.c.internals.ConsumerCoordinator : [Consumer
clientId=k-streams-1-consumer, groupId=k-streams_app_id] Updating assignment
with
Assigned partitions: [Topic1-3, k-streams_app_id-Topic2-repartition-1,
k-streams_app_id-Topic2-repartition-5, k-streams_app_id-Topic3-repartition-3...]
{code}
This log makes it difficult to separate actual data storage from internal
shuffle infrastructure, and it fails to explain which specific operator
triggered the repartition.
*The Solution*
Add a configuration property that scans the compiled topology during
initialization. If Kafka Streams detects that internal repartition topics will
be generated on the cluster, it will log a clean, structured WARN block
explicitly detailing them before the application begins processing data.New
Configuration Property:
{code:java}
streams.warn.on.repartition (Boolean, default: true)
{code}
Proposed Warning Log Output Example:
{code:java}
WARN org.apache.kafka.streams.KafkaStreams - [Topology Diagnostics] Internal
repartition topics detected:
1. Topic: k-streams_app_id-Topic1-repartition
Trigger Operator: KSTREAM-AGGREGATE-0000000003 (groupByKey)
Upstream Cause: Key was flagged as changed by an upstream operator
(selectKey)
2. Topic: k-streams_app_id-Topic2-repartition
Trigger Operator: KSTREAM-JOIN-0000000008 (join)
{code}
*Impact & CompatibilityBackward*
* Compatible: Yes. It only introduces diagnostic log messages. Existing
applications will run completely unchanged.
* Performance Impact: Negligible. The structural topology scan runs exactly
once during the KafkaStreams startup sequence.
* Opt-out: Users can disable this output by setting
streams.warn.on.repartition=false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)