[ 
https://issues.apache.org/jira/browse/KAFKA-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081624#comment-18081624
 ] 

Matthias J. Sax commented on KAFKA-20593:
-----------------------------------------

If we really only add a log line, no need for a KIP. If course, if we want to 
add a config for it, we would need a ticket.  But why would we  need a config? 
– Personally, I don't see a lot of value though in this ticket anyway?

Creating internal topic is business as usually. WARN logs are about, well, 
warning something. We could of course still log an INFO, but overall, I am 
still not sure what the problem is:
{quote}Currently, discovering these requires manually printing 
Topology.describe()
{quote}
What is the issue to just so this manually?

Also, if you really want to create topics manually, there is already 
https://issues.apache.org/jira/browse/KAFKA-10357, which has an accepted KIP 
and is WIP.

> Log a diagnostic warning at startup when Kafka Streams detects internal 
> repartition topics
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20593
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20593
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Ankur Sinha
>            Priority: Minor
>
> *The Problem*
> When a topology changes a record key (e.g., .selectKey(), .map()) and 
> subsequently calls a stateful operation (e.g., .groupByKey(), .join()), Kafka 
> Streams automatically provisions and manages internal repartition topics on 
> the broker cluster.While this is core architectural behavior, these topics 
> are created implicitly. Developers frequently introduce accidental, highly 
> expensive network shuffles without realizing the operational and cloud 
> infrastructure cost impact.Currently, discovering these requires manually 
> printing Topology.describe() or digging through noisy, verbose consumer group 
> rebalance logs long after the application has started. For example, a 
> developer today has to parse runtime blocks like this just to find hidden 
> shuffles:
> {code:java}
> 2026-05-11T16:18:33.861Z  INFO 1 --- [k-streams] [-StreamThread-1] 
> o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer 
> clientId=k-streams-1-consumer, groupId=k-streams_app_id] Updating assignment 
> with
>         Assigned partitions: [Topic1-3, 
> k-streams_app_id-Topic2-repartition-1, k-streams_app_id-Topic2-repartition-5, 
> k-streams_app_id-Topic3-repartition-3...]
> {code}
> This log makes it difficult to separate actual data storage from internal 
> shuffle infrastructure, and it fails to explain which specific operator 
> triggered the repartition.
> *The Solution*
> Add a configuration property that scans the compiled topology during 
> initialization. If Kafka Streams detects that internal repartition topics 
> will be generated on the cluster, it will log a clean, structured WARN block 
> explicitly detailing them before the application begins processing data.New 
> Configuration Property:
> {code:java}
> streams.warn.on.repartition (Boolean, default: true)
> {code}
> Proposed Warning Log Output Example:
> {code:java}
> WARN  org.apache.kafka.streams.KafkaStreams - [Topology Diagnostics] Internal 
> repartition topics detected:
> 1. Topic: k-streams_app_id-Topic1-repartition
>    Trigger Operator: KSTREAM-AGGREGATE-0000000003 (groupByKey)
>    Upstream Cause: Key was flagged as changed by an upstream operator 
> (selectKey)
> 2. Topic: k-streams_app_id-Topic2-repartition
>    Trigger Operator: KSTREAM-JOIN-0000000008 (join)
> {code}
> *Impact & CompatibilityBackward* 
> * Compatible: Yes. It only introduces diagnostic log messages. Existing 
> applications will run completely unchanged.
> * Performance Impact: Negligible. The structural topology scan runs exactly 
> once during the KafkaStreams startup sequence.
> * Opt-out: Users can disable this output by setting 
> streams.warn.on.repartition=false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to