[
https://issues.apache.org/jira/browse/KAFKA-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126166#comment-17126166
]
Boyang Chen commented on KAFKA-9377:
------------------------------------
[~feyman] Hey I'm assigning this ticket to you. A little bit more context:
You could search for function
`StreamsPartitionAssignor#setRepartitionTopicMetadataNumberOfPartitions` and
look at what it does currently. It basically tries to initialize all the node
repartition topic count by doing a random walk through of every node in an
infinite loop. This is not efficient and intuitive, and we have been planning
to refactor it and build a bottom-up DFS search, meaning that a parent node
could only be initialized after all its children's topic partitions are
initialized. Another reference is a unit test case
`StreamsPartitionAssignor#shouldNotFailOnBranchedMultiLevelRepartitionConnectedTopology`
which is validating a bug we fixed inside this logic a while ago, which
hopefully gives you better insight.
Let me know if this makes sense to you, I know it's a bit unfriendly for a
beginner task, but it definitely worths your effort to dig in and understand
the KStream topology creation knowledge.
> Refactor StreamsPartitionAssignor Repartition Count logic
> ---------------------------------------------------------
>
> Key: KAFKA-9377
> URL: https://issues.apache.org/jira/browse/KAFKA-9377
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Boyang Chen
> Assignee: feyman
> Priority: Major
>
> The current repartition count uses a big while loop to randomly initialize
> each repartition topic counts, which is error-prone and hard to maintain. A
> more intuitive and robust solution would be doing a DFS search from
> bottom-up, where we initialize all the sink nodes repartition topic counts by
> making sure all their parents are initialized.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)