Nicklee007 opened a new pull request, #3339:
URL: https://github.com/apache/bookkeeper/pull/3339

   Master Issue: #3338
   
   ### Motivation
   As the Decommissioning bookie case, always change the bookie status to 
readonly firstly, and then wait some data expired, but always it has some 
ledgers (about 100+ -- 300+) legacy not be cleaned and the leaved ledgers only 
has little data , when we running `bin/bookkeeper shell decommissionbookie 
-bookieid  ` to decommission the bookie ,  we always pending on 
`waitForLedgersToBeReplicated()` about 10 min and have not any log print, but 
we could find the znode /ledgers/underreplication/ledgers cleaned only few 
seconds and then the ledgers be  rereplicate completed, we find  the sleep time 
is defined as the `Min(ledgers.size() * sleepTimePerLedger(10s) , 
maxSleepTimeInBetweenChecks(10 min))`, in the way , the both time always wait 
too long and has not any print will let user confused。
    
   
   ### Changes
   In the bookie has many data to rereplicate case , we need the backoff policy 
to protect zk server, so 
   1. To look the `/ledgers/underreplication/ledgers ` and ` 
/ledgers/underreplication/locks` every 10 sec, help us check if the ledgers 
replicate completed.
   2. To avoid the auditor is running as CheckAllLedgers or other 
time-consuming operation when we trigger the audit bookie, then 
`/ledgers/underreplication/ledgers ` and ` /ledgers/underreplication/locks` 
will empty to long time , we need a back off policy to avoid frequent request 
zk to `validateBookieIsNotPartOfEnsemble` 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to