[
https://issues.apache.org/jira/browse/KAFKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Nigam reassigned KAFKA-1599:
-------------------------------------
Assignee: Abhishek Nigam
> Change preferred replica election admin command to handle large clusters
> ------------------------------------------------------------------------
>
> Key: KAFKA-1599
> URL: https://issues.apache.org/jira/browse/KAFKA-1599
> Project: Kafka
> Issue Type: Improvement
> Affects Versions: 0.8.2.0
> Reporter: Todd Palino
> Assignee: Abhishek Nigam
> Labels: newbie++
>
> We ran into a problem with a cluster that has 70k partitions where we could
> not trigger a preferred replica election for all topics and partitions using
> the admin tool. Upon investigation, it was determined that this was because
> the JSON object that was being written to the admin znode to tell the
> controller to start the election was 1.8 MB in size. As the default Zookeeper
> data size limit is 1MB, and it is non-trivial to change, we should come up
> with a better way to represent the list of topics and partitions for this
> admin command.
> I have several thoughts on this so far:
> 1) Trigger the command for all topics and partitions with a JSON object that
> does not include an explicit list of them (i.e. a flag that says "all
> partitions")
> 2) Use a more compact JSON representation. Currently, the JSON contains a
> 'partitions' key which holds a list of dictionaries that each have a 'topic'
> and 'partition' key, and there must be one list item for each partition. This
> results in a lot of repetition of key names that is unneeded. Changing this
> to a format like this would be much more compact:
> {'topics': {'topicName1': [0, 1, 2, 3], 'topicName2': [0,1]}, 'version': 1}
> 3) Use a representation other than JSON. Strings are inefficient. A binary
> format would be the most compact. This does put a greater burden on tools and
> scripts that do not use the inbuilt libraries, but it is not too high.
> 4) Use a representation that involves multiple znodes. A structured tree in
> the admin command would probably provide the most complete solution. However,
> we would need to make sure to not exceed the data size limit with a wide tree
> (the list of children for any single znode cannot exceed the ZK data size of
> 1MB)
> Obviously, there could be a combination of #1 with a change in the
> representation, which would likely be appropriate as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)