[jira] [Commented] (KAFKA-9377) Refactor StreamsPartitionAssignor Repartition Count logic

Boyang Chen (Jira) Thu, 04 Jun 2020 13:05:54 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126166#comment-17126166
 ]


Boyang Chen commented on KAFKA-9377:
------------------------------------

[~feyman] Hey I'm assigning this ticket to you. A little bit more context:



You could search for function 
`StreamsPartitionAssignor#setRepartitionTopicMetadataNumberOfPartitions` and 
look at what it does currently. It basically tries to initialize all the node 
repartition topic count by doing a random walk through of every node in an 
infinite loop. This is not efficient and intuitive, and we have been planning 
to refactor it and build a bottom-up DFS search, meaning that a parent node 
could only be initialized after all its children's topic partitions are 
initialized. Another reference is a unit test case 
`StreamsPartitionAssignor#shouldNotFailOnBranchedMultiLevelRepartitionConnectedTopology`
 which is validating a bug we fixed inside this logic a while ago, which 
hopefully gives you better insight.

Let me know if this makes sense to you, I know it's a bit unfriendly for a 
beginner task, but it definitely worths your effort to dig in and understand 
the KStream topology creation knowledge.

> Refactor StreamsPartitionAssignor Repartition Count logic
> ---------------------------------------------------------
>
>                 Key: KAFKA-9377
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9377
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Boyang Chen
>            Assignee: feyman
>            Priority: Major
>
> The current repartition count uses a big while loop to randomly initialize 
> each repartition topic counts, which is error-prone and hard to maintain. A 
> more intuitive and robust solution would be doing a DFS search from 
> bottom-up, where we initialize all the sink nodes repartition topic counts by 
> making sure all their parents are initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9377) Refactor StreamsPartitionAssignor Repartition Count logic

Reply via email to