This is an automated email from the ASF dual-hosted git repository.

eolivelli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/bookkeeper.git


The following commit(s) were added to refs/heads/master by this push:
     new 025d99f  fix bookie decommission sleep timeout value is negative bug
025d99f is described below

commit 025d99f5a2a4cc02f3780a11b58a9b9d6c9940c3
Author: hangc0276 <[email protected]>
AuthorDate: Tue Mar 10 21:17:53 2020 +0800

    fix bookie decommission sleep timeout value is negative bug
    
    when decommission a bookie, and the ledger size of the bookie is big 
enough, the thread timeout will get negative, and the decommission operation 
will give up by throw exceptions as follow
    ```
    14:12:56.982 [main] INFO  org.apache.bookkeeper.client.BookKeeperAdmin - 
Count of Ledgers which need to be rereplicated: 272752
    14:12:56.983 [main] ERROR org.apache.bookkeeper.bookie.BookieShell - 
Received exception in DecommissionBookieCmd
    java.lang.IllegalArgumentException: timeout value is negative
        at java.lang.Thread.sleep(Native Method) ~[?:?]
        at 
org.apache.bookkeeper.client.BookKeeperAdmin.waitForLedgersToBeReplicated(BookKeeperAdmin.java:1528)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.client.BookKeeperAdmin.decommissionBookie(BookKeeperAdmin.java:1500)
 ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.bookie.BookieShell$DecommissionBookieCmd.runCmd(BookieShell.java:2664)
 [org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:277) 
[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:3081) 
[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
        at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:3172) 
[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
    14:12:57.013 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 
0x206189927840052 closed
    ```
    The exception code is
    ```
    private void waitForLedgersToBeReplicated(Collection<Long> ledgers, 
BookieSocketAddress thisBookieAddress,
                LedgerManager ledgerManager) throws InterruptedException, 
TimeoutException {
            int maxSleepTimeInBetweenChecks = 10 * 60 * 1000; // 10 minutes
            int sleepTimePerLedger = 10 * 1000; // 10 secs
            Predicate<Long> validateBookieIsNotPartOfEnsemble = ledgerId -> 
!areEntriesOfLedgerStoredInTheBookie(ledgerId,
                    thisBookieAddress, ledgerManager);
            while (!ledgers.isEmpty()) {
                LOG.info("Count of Ledgers which need to be rereplicated: {}", 
ledgers.size());
                int sleepTimeForThisCheck = ledgers.size() * sleepTimePerLedger 
> maxSleepTimeInBetweenChecks
                        ? maxSleepTimeInBetweenChecks : ledgers.size() * 
sleepTimePerLedger;
                Thread.sleep(sleepTimeForThisCheck);
                LOG.debug("Making sure following ledgers replication to be 
completed: {}", ledgers);
                ledgers.removeIf(validateBookieIsNotPartOfEnsemble);
            }
        }
    ```
    the ledger size is `272752`, when computing sleepTimeForThisCheck,
    `ledgers.size() * sleepTimePerLedger` is `272752 * 10 * 1000 = 2727520000`,
    the value exceeds max int value `2147483647`, it will turn to 
`-1567447296`, then the sleepTimeForThisCheck will be `-1567447296`.
     Thread.sleep will throw `java.lang.IllegalArgumentException: timeout value 
is negative` exception
    
    Reviewers: Enrico Olivelli <[email protected]>, Jia Zhai 
<[email protected]>
    
    This closes #2284 from hangc0276/bug_fix
---
 .../src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java     | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
 
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
index 88a7c08..cac1d9d 100644
--- 
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
+++ 
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeperAdmin.java
@@ -1536,7 +1536,7 @@ public class BookKeeperAdmin implements AutoCloseable {
                 thisBookieAddress, ledgerManager);
         while (!ledgers.isEmpty()) {
             LOG.info("Count of Ledgers which need to be rereplicated: {}", 
ledgers.size());
-            int sleepTimeForThisCheck = ledgers.size() * sleepTimePerLedger > 
maxSleepTimeInBetweenChecks
+            int sleepTimeForThisCheck = (long) ledgers.size() * 
sleepTimePerLedger > maxSleepTimeInBetweenChecks
                     ? maxSleepTimeInBetweenChecks : ledgers.size() * 
sleepTimePerLedger;
             Thread.sleep(sleepTimeForThisCheck);
             LOG.debug("Making sure following ledgers replication to be 
completed: {}", ledgers);

Reply via email to