[
https://issues.apache.org/jira/browse/HBASE-19624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chia-Ping Tsai updated HBASE-19624:
-----------------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
Thanks for the reviews. [~tedyu]
> TestIOFencing hangs
> -------------------
>
> Key: HBASE-19624
> URL: https://issues.apache.org/jira/browse/HBASE-19624
> Project: HBase
> Issue Type: Bug
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Fix For: 2.0.0
>
> Attachments: HBASE-19624.v0.patch
>
>
> RS calls CompactSplit#join to cease all compactSplit threads.
> {code:title=CompactSplit.java}
> private void waitFor(ThreadPoolExecutor t, String name) {
> boolean done = false;
> while (!done) {
> try {
> done = t.awaitTermination(60, TimeUnit.SECONDS);
> LOG.info("Waiting for " + name + " to finish...");
> if (!done) {
> t.shutdownNow();
> }
> } catch (InterruptedException ie) {
> LOG.warn("Interrupted waiting for " + name + " to finish...");
> }
> }
> }
> {code}
> In the meantime, the async wal may wait for the sync signal. However, the
> single won't happen as the wal sync is failed.
> {code}
> synchronized long get(long timeoutNs) throws InterruptedException,
> ExecutionException, TimeoutIOException {
> final long done = System.nanoTime() + timeoutNs;
> while (!isDone()) {
> wait(1000);
> if (System.nanoTime() >= done) {
> throw new TimeoutIOException(
> "Failed to get sync result after " +
> TimeUnit.NANOSECONDS.toMillis(timeoutNs)
> + " ms for txid=" + this.txid + ", WAL system stuck?");
> }
> }
> if (this.throwable != null) {
> throw new ExecutionException(this.throwable);
> }
> return this.doneTxid;
> }
> {code}
> When we shutdown the mini cluster, JVMClusterUtil#shutdown sends the
> interrupt single to all rs threads. And then catching the
> InterruptedException cause compactionsplit to skip the #shutdownNow, hence
> the compactionsplit threads were up until timeout (default is 5 min).
> {code}
> for (int i = 0; i < 100; ++i) {
> boolean atLeastOneLiveServer = false;
> for (RegionServerThread t : regionservers) {
> if (t.isAlive()) {
> atLeastOneLiveServer = true;
> try {
> LOG.warn("RegionServerThreads remaining, give one more chance
> before interrupting");
> t.join(1000);
> } catch (InterruptedException e) {
> wasInterrupted = true;
> }
> }
> }
> if (!atLeastOneLiveServer) break;
> for (RegionServerThread t : regionservers) {
> if (t.isAlive()) {
> LOG.warn("RegionServerThreads taking too long to stop,
> interrupting");
> t.interrupt();
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)