[jira] [Updated] (HBASE-19624) TestIOFencing hangs

Chia-Ping Tsai (JIRA) Tue, 26 Dec 2017 17:44:54 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-19624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chia-Ping Tsai updated HBASE-19624:
-----------------------------------
      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Thanks for the reviews. [~tedyu]

> TestIOFencing hangs
> -------------------
>
>                 Key: HBASE-19624
>                 URL: https://issues.apache.org/jira/browse/HBASE-19624
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>             Fix For: 2.0.0
>
>         Attachments: HBASE-19624.v0.patch
>
>
> RS calls CompactSplit#join to cease all compactSplit threads.
> {code:title=CompactSplit.java}
>   private void waitFor(ThreadPoolExecutor t, String name) {
>     boolean done = false;
>     while (!done) {
>       try {
>         done = t.awaitTermination(60, TimeUnit.SECONDS);
>         LOG.info("Waiting for " + name + " to finish...");
>         if (!done) {
>           t.shutdownNow();
>         }
>       } catch (InterruptedException ie) {
>         LOG.warn("Interrupted waiting for " + name + " to finish...");
>       }
>     }
>   }
> {code}
> In the meantime, the async wal may wait for the sync signal. However, the 
> single won't happen as the wal sync is failed.
> {code}
>   synchronized long get(long timeoutNs) throws InterruptedException,
>       ExecutionException, TimeoutIOException {
>     final long done = System.nanoTime() + timeoutNs;
>     while (!isDone()) {
>       wait(1000);
>       if (System.nanoTime() >= done) {
>         throw new TimeoutIOException(
>             "Failed to get sync result after " + 
> TimeUnit.NANOSECONDS.toMillis(timeoutNs)
>                 + " ms for txid=" + this.txid + ", WAL system stuck?");
>       }
>     }
>     if (this.throwable != null) {
>       throw new ExecutionException(this.throwable);
>     }
>     return this.doneTxid;
>   }
> {code}
> When we shutdown the mini cluster, JVMClusterUtil#shutdown sends the 
> interrupt single to all rs threads. And then catching the 
> InterruptedException cause compactionsplit to skip the #shutdownNow, hence 
> the compactionsplit threads were up until timeout (default is 5 min).   
> {code}
>       for (int i = 0; i < 100; ++i) {
>         boolean atLeastOneLiveServer = false;
>         for (RegionServerThread t : regionservers) {
>           if (t.isAlive()) {
>             atLeastOneLiveServer = true;
>             try {
>               LOG.warn("RegionServerThreads remaining, give one more chance 
> before interrupting");
>               t.join(1000);
>             } catch (InterruptedException e) {
>               wasInterrupted = true;
>             }
>           }
>         }
>         if (!atLeastOneLiveServer) break;
>         for (RegionServerThread t : regionservers) {
>           if (t.isAlive()) {
>             LOG.warn("RegionServerThreads taking too long to stop, 
> interrupting");
>             t.interrupt();
>           }
>         }
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19624) TestIOFencing hangs

Reply via email to