[ 
https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355824#comment-16355824
 ] 

ASF GitHub Bot commented on KAFKA-6469:
---------------------------------------

ambroff opened a new pull request #4540: KAFKA-6469 Batch ISR change 
notifications
URL: https://github.com/apache/kafka/pull/4540
 
 
   When the writes /isr_change_notification in ZooKeeper (which is
   effectively a queue of ISR change events for the controller) happen at
   a rate high enough that the node with a watch can't dequeue them, the
   trouble starts.
   
   The watcher kafka.controller.IsrChangeNotificationListener is fired in
   the controller when a new entry is written to
   /isr_change_notification, and the zkclient library sends a
   GetChildrenRequest to zookeeper to fetch all child znodes.
   
   We've failures in one of our test clusters as the partition count
   started to climb north of 60k per broker. We had brokers writing child
   nodes under /isr_change_notification that were larger than the
   jute.maxbuffer size in ZooKeeper (1MB), causing the ZooKeeper server
   to drop the controller's session, effectively bricking the cluster.
   
   This can be partially mitigated by chunking ISR notifications to
   increase the maximum number of partitions a broker can host, which is
   the purpose of this patch.
   
   KafkaZkClient#propagateIsrChanges() now batches the set of
   TopicPartitions that will be written to the queue into sets of
   isr.notification.batch.size, which defaults to 3000. This default
   value is an approximate size that will guarantee that the JSON
   serialized collection will always be well under 1MB.
   
   You can see the worst case scenario in
   KafkaZkClientTest#testPropagateLargeNumberOfIsrChanges(), where a set
   of 5000 TopicPartitions are provided which have the longest possible
   JSON representation. This leads to a JSON payload that is around
   850k, leaving headroom for additional metadata.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ISR change notification queue can prevent controller from making progress
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-6469
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6469
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Kyle Ambroff-Kao
>            Assignee: Kyle Ambroff-Kao
>            Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a 
> queue of ISR change events for the controller) happen at a rate high enough 
> that the node with a watch can't dequeue them, the trouble starts.
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the 
> controller when a new entry is written to /isr_change_notification, and the 
> zkclient library sends a GetChildrenRequest to zookeeper to fetch all child 
> znodes.
> We've failures in one of our test clusters as the partition count started to 
> climb north of 60k per broker. We had brokers writing child nodes under 
> /isr_change_notification that were larger than the jute.maxbuffer size in 
> ZooKeeper (1MB), causing the ZooKeeper server to drop the controller's 
> session, effectively bricking the cluster.
> This can be partially mitigated by chunking ISR notifications to increase the 
> maximum number of partitions a broker can host.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to