[ https://issues.apache.org/jira/browse/KAFKA-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567441#comment-16567441 ]
ASF GitHub Bot commented on KAFKA-6761: --------------------------------------- bbejeck opened a new pull request #5451: KAFKA-6761: Reduce streams footprint part IV add optimization URL: https://github.com/apache/kafka/pull/5451 This PR adds the optimization of eliminating multiple repartition topics when the `KStream` resulting from a key-changing operation executes other methods using the new key and reduces the repartition topics to one. Note that this PR leaves in place the optimization for re-using a source topic as a changelog topic for source `KTable` instances. I'll have another follow-up PR to move the source topic optimization to a method within `InternalStreamsBuilder` so it can be performed in the same area of the code. Additionally, the current value of `StreamsConfig.OPTIMIZE` is `all` and we'll need to have another KIP to change the value to `2.1`. An integration test `RepartitionOptimizingIntegrationTest` which asserts the same results for an optimized topology with one repartition topic as the un-optimized version with four repartition topics. More tests will be added, but I wanted to get reviews on the approach now. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce Kafka Streams Footprint > ------------------------------ > > Key: KAFKA-6761 > URL: https://issues.apache.org/jira/browse/KAFKA-6761 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Bill Bejeck > Assignee: Bill Bejeck > Priority: Major > Fix For: 2.1.0 > > > The persistent storage footprint of a Kafka Streams application contains the > following aspects: > # The internal topics created on the Kafka cluster side. > # The materialized state stores on the Kafka Streams application instances > side. > There have been some questions about reducing these footprints, especially > since many of them are not necessary. For example, there are redundant > internal topics, as well as unnecessary state stores that takes up space but > also affect performance. When people are pushing Streams to production with > high traffic, this issue would be more common and severe. Reducing the > footprint of Streams have clear benefits for reducing resource utilization of > Kafka Streams applications, and also not creating pressure on broker's > capacities. -- This message was sent by Atlassian JIRA (v7.6.3#76005)