hangc0276 opened a new pull request #2284: fix bookie decommission sleep 
timeout value is negative bug
URL: https://github.com/apache/bookkeeper/pull/2284
 
 
   when decommission a bookie, and the ledger size of the bookie is big enough, 
the thread timeout will get negative, and the decommission operation will give 
up by throw exceptions as follow
   ```
   14:12:56.982 [main] INFO  org.apache.bookkeeper.client.BookKeeperAdmin - 
Count of Ledgers which need to be rereplicated: 272752
   14:12:56.983 [main] ERROR org.apache.bookkeeper.bookie.BookieShell - 
Received exception in DecommissionBookieCmd 
   java.lang.IllegalArgumentException: timeout value is negative
        at java.lang.Thread.sleep(Native Method) ~[?:?]
        at 
org.apache.bookkeeper.client.BookKeeperAdmin.waitForLedgersToBeReplicated(BookKeeperAdmin.java:1528)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.client.BookKeeperAdmin.decommissionBookie(BookKeeperAdmin.java:1500)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.bookie.BookieShell$DecommissionBookieCmd.runCmd(BookieShell.java:2664)
 [org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:277) 
[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:3081) 
[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:3172) 
[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
   14:12:57.013 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 
0x206189927840052 closed
   ```
   The exception code is 
   ```
   private void waitForLedgersToBeReplicated(Collection<Long> ledgers, 
BookieSocketAddress thisBookieAddress,
               LedgerManager ledgerManager) throws InterruptedException, 
TimeoutException {
           int maxSleepTimeInBetweenChecks = 10 * 60 * 1000; // 10 minutes
           int sleepTimePerLedger = 10 * 1000; // 10 secs
           Predicate<Long> validateBookieIsNotPartOfEnsemble = ledgerId -> 
!areEntriesOfLedgerStoredInTheBookie(ledgerId,
                   thisBookieAddress, ledgerManager);
           while (!ledgers.isEmpty()) {
               LOG.info("Count of Ledgers which need to be rereplicated: {}", 
ledgers.size());
               int sleepTimeForThisCheck = ledgers.size() * sleepTimePerLedger 
> maxSleepTimeInBetweenChecks
                       ? maxSleepTimeInBetweenChecks : ledgers.size() * 
sleepTimePerLedger;
               Thread.sleep(sleepTimeForThisCheck);
               LOG.debug("Making sure following ledgers replication to be 
completed: {}", ledgers);
               ledgers.removeIf(validateBookieIsNotPartOfEnsemble);
           }
       }
   ```
   the ledger size is `272752`, when computing sleepTimeForThisCheck, 
   `ledgers.size() * sleepTimePerLedger` is `272752 * 10 * 1000 = 2727520000`, 
   the value exceeds max int value `2147483647`, it will turn to `-1567447296`, 
then the sleepTimeForThisCheck will be `-1567447296`.
    Thread.sleep will throw `java.lang.IllegalArgumentException: timeout value 
is negative` exception

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to