hangc0276 opened a new pull request #2802: URL: https://github.com/apache/bookkeeper/pull/2802
### Motivation When we start BookKeeper auditor, it will start checking all ledgers task with given interval(default 7days) compared to last check timestamp, and the check ledger operation will process all the activeLedgers shown in the follow code. https://github.com/apache/bookkeeper/blob/c7236adc3cb659e65ae5ce53b7156569d7f50ebd/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L571-L573 In the logic of process ledger, it will call `readLedgerMetadata` , which will call getData using zkClient and parse ZNode data in callback, which is time consuming. https://github.com/apache/bookkeeper/blob/c7236adc3cb659e65ae5ce53b7156569d7f50ebd/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L433-L474 When use use NettySocketChannel for zk client, and the activeLedger number grows more than 1500+, it will send all ledgers'getData request to zkServer at a time (before starting receive data from zookeeper server), and call `serDe.parseConfig` to parse ZNode data in callback. However, the parse Znode is time consuming, which will block zk client send heartbeat to zk server, and will cause zk session expire. When zk session expire, the BookKeeper auto recovery process will shutdown. ### Modification 1. Add throttle semaphore for Auditor to open ledgers, which will call getData using Zookeeper client. 2. Use individual thread to call `processor.process(ledger, mcb);` for all ledgers instead of Zookeeper client callback thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
