[ https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355824#comment-16355824 ]
ASF GitHub Bot commented on KAFKA-6469: --------------------------------------- ambroff opened a new pull request #4540: KAFKA-6469 Batch ISR change notifications URL: https://github.com/apache/kafka/pull/4540 When the writes /isr_change_notification in ZooKeeper (which is effectively a queue of ISR change events for the controller) happen at a rate high enough that the node with a watch can't dequeue them, the trouble starts. The watcher kafka.controller.IsrChangeNotificationListener is fired in the controller when a new entry is written to /isr_change_notification, and the zkclient library sends a GetChildrenRequest to zookeeper to fetch all child znodes. We've failures in one of our test clusters as the partition count started to climb north of 60k per broker. We had brokers writing child nodes under /isr_change_notification that were larger than the jute.maxbuffer size in ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's session, effectively bricking the cluster. This can be partially mitigated by chunking ISR notifications to increase the maximum number of partitions a broker can host, which is the purpose of this patch. KafkaZkClient#propagateIsrChanges() now batches the set of TopicPartitions that will be written to the queue into sets of isr.notification.batch.size, which defaults to 3000. This default value is an approximate size that will guarantee that the JSON serialized collection will always be well under 1MB. You can see the worst case scenario in KafkaZkClientTest#testPropagateLargeNumberOfIsrChanges(), where a set of 5000 TopicPartitions are provided which have the longest possible JSON representation. This leads to a JSON payload that is around 850k, leaving headroom for additional metadata. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ISR change notification queue can prevent controller from making progress > ------------------------------------------------------------------------- > > Key: KAFKA-6469 > URL: https://issues.apache.org/jira/browse/KAFKA-6469 > Project: Kafka > Issue Type: Bug > Reporter: Kyle Ambroff-Kao > Assignee: Kyle Ambroff-Kao > Priority: Major > > When the writes /isr_change_notification in ZooKeeper (which is effectively a > queue of ISR change events for the controller) happen at a rate high enough > that the node with a watch can't dequeue them, the trouble starts. > The watcher kafka.controller.IsrChangeNotificationListener is fired in the > controller when a new entry is written to /isr_change_notification, and the > zkclient library sends a GetChildrenRequest to zookeeper to fetch all child > znodes. > We've failures in one of our test clusters as the partition count started to > climb north of 60k per broker. We had brokers writing child nodes under > /isr_change_notification that were larger than the jute.maxbuffer size in > ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's > session, effectively bricking the cluster. > This can be partially mitigated by chunking ISR notifications to increase the > maximum number of partitions a broker can host. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)