GitHub user uncleGen opened a pull request:
https://github.com/apache/kafka/pull/3894
KAFKA-5928: Avoid redundant requests to zookeeper when reassign topic
partition
We mistakenly request topic level information according to partitions
config in the assignment json file. For example
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala#L550:
```
val validPartitions = proposedPartitionAssignment.filter { case (p, _) =>
validatePartition(zkUtils, p.topic, p.partition) }
```
If reassign 1000 partitions (in 10 topics), we need to request zookeeper
1000 times here. But actually we only need to request just 10 (topics) times.
We test a large-scale assignment, about 10K partitions. It takes tens of
minutes. After optimization, it will reduce to less than 1minute.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uncleGen/kafka KAFKA-5928
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/3894.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3894
----
commit f6c30e81c7110f72e254bb9dfa81a25f951b70a1
Author: æ¨è® <[email protected]>
Date: 2017-09-19T03:01:20Z
Avoid redundant requests to zookeeper when reassign topic partition
----
---