[ 
https://issues.apache.org/jira/browse/HBASE-19542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296382#comment-16296382
 ] 

Chia-Ping Tsai commented on HBASE-19542:
----------------------------------------

bq. So where is the test stuck?
The critical code is shown below.
{code:title=FanOutOneBlockAsyncDFSOutputHelper.java}
  static void completeFile(DFSClient client, ClientProtocol namenode, String 
src, String clientName,
      ExtendedBlock block, long fileId) {
    for (int retry = 0;; retry++) {
      try {
        if (namenode.complete(src, clientName, block, fileId)) {
          endFileLease(client, fileId);
          return;
        } else {
          LOG.warn("complete file " + src + " not finished, retry = " + retry);
        }
      } catch (RemoteException e) {
        IOException ioe = e.unwrapRemoteException();
        if (ioe instanceof LeaseExpiredException) {
          LOG.warn("lease for file " + src + " is expired, give up", e);
          return;
        } else {
          LOG.warn("complete file " + src + " failed, retry = " + retry, e);
        }
      } catch (Exception e) {
        LOG.warn("complete file " + src + " failed, retry = " + retry, e);
      }
      sleepIgnoreInterrupt(retry);
    }
  }
{code}
If the filesystem is in safe mode, the exception here is of the RemoteException 
wrapping a SafeModeException. So it hangs in the loop when we are closing the 
wal.

bq. This means we may leave a wal always open if a FileSystem is temporary 
unavailable but the RS is not down? 
Or we can shutdown the rs if it reaches the retry limit?



> fix TestSafemodeBringsDownMaster
> --------------------------------
>
>                 Key: HBASE-19542
>                 URL: https://issues.apache.org/jira/browse/HBASE-19542
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>             Fix For: 2.0.0-beta-1
>
>         Attachments: HBASE-19542.v0.patch
>
>
> We need to check the stability of underlay file system when closing async 
> wal.  Otherwise, our hbase can't shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to