hangc0276 opened a new pull request #2802:
URL: https://github.com/apache/bookkeeper/pull/2802


   ### Motivation
   When we start BookKeeper auditor, it will start checking all ledgers task 
with given interval(default 7days) compared to last check timestamp,  and the 
check ledger operation will process all the activeLedgers shown in the follow 
code.
   
https://github.com/apache/bookkeeper/blob/c7236adc3cb659e65ae5ce53b7156569d7f50ebd/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L571-L573
   
    In the logic of process ledger, it will call `readLedgerMetadata` , which 
will call getData using zkClient and parse ZNode data in callback, which is 
time consuming.
   
https://github.com/apache/bookkeeper/blob/c7236adc3cb659e65ae5ce53b7156569d7f50ebd/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L433-L474
   
   When use use NettySocketChannel for zk client, and the activeLedger number 
grows more than 1500+, it will send all ledgers'getData  request to zkServer at 
a time (before starting receive data from zookeeper server), and call 
`serDe.parseConfig` to parse ZNode data in callback.  However, the parse Znode 
is time consuming, which will block zk client send heartbeat to zk server, and 
will cause zk session expire.
   
   When zk session expire, the BookKeeper auto recovery process will shutdown.
   
   ### Modification
   1. Add throttle semaphore for Auditor to open ledgers, which will call 
getData using Zookeeper client.
   2. Use  individual thread to call `processor.process(ledger, mcb);` for all 
ledgers instead of Zookeeper client callback thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to