[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102973#comment-13102973
 ] 

stack commented on HBASE-4357:
------------------------------

@Ming In TRUNK, we have RecoverableZooKeeper.  It does the following when its 
trying to get version:

{code}
  /**
   * exists is an idempotent operation. Retry before throw out exception
   * @param path
   * @param watcher
   * @return
   * @throws KeeperException
   * @throws InterruptedException
   */
  public Stat exists(String path, Watcher watcher)
  throws KeeperException, InterruptedException {
    RetryCounter retryCounter = retryCounterFactory.create();
    while (true) {
      try {
        return zk.exists(path, watcher);
      } catch (KeeperException e) {
        switch (e.code()) {
          case CONNECTIONLOSS:
          case OPERATIONTIMEOUT:
            LOG.warn("Possibly transient ZooKeeper exception: " + e);
            if (!retryCounter.shouldRetry()) {
              LOG.error("ZooKeeper exists failed after "
                + retryCounter.getMaxRetries() + " retries");
              throw e;
            }
            break;

          default:
            throw e;
        }
      }
      LOG.info("The "+retryCounter.getAttemptTimes()+" times to retry " +
          "ZooKeeper after sleeping "+retryIntervalMillis+" ms");
      retryCounter.sleepUntilNextRetry();
      retryCounter.useRetry();
    }
  }
{code}

That is, it retries.

We should probably do your #2 above too.

> Region in transition - in closing state
> ---------------------------------------
>
>                 Key: HBASE-4357
>                 URL: https://issues.apache.org/jira/browse/HBASE-4357
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ming Ma
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,60000,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to