[ 
https://issues.apache.org/jira/browse/HBASE-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657168#comment-13657168
 ] 

Julian Zhou commented on HBASE-8533:
------------------------------------

For the topic of HBaseAdmin riding over cluster restart, looks like the main 
logic is in HConnectionManager.
For this rest scenario, at the very first restart of hbase service after rest 
started (no actual hbase admin rpc call happened yet), 
#isKeepAliveMasterConnectedAndRunning return false since 
MasterMonitorServiceState's stub is null (has not been built from protobuf via 
a RPC operation), otherwise MasterMonitorServiceState.isMasterRunning() will 
throw protobuf's ServiceException, then will make new protobuf BlockingStub via 
#makeStub().

There is 2 rounds of retries for HBaseAdmin's HConnection trying to invoke 
master services if master is not running:
1) RecoverableZooKeeper retries 3 times if ZooKeeperWatcher met 
KeeperException.CONNECTIONLOSS;
2) in #makeStub(), there are 10 getMaster retries if any exception caught from 
ZooKeeper connection.

I just tried to modify HConnectionImplementation#checkIfBaseNodeAvailable to 
throw the KeeperException with CONNECTIONLOSS out caught from 
"ZKUtil.checkExists" instead of throwing MasterNotRunningException out, then in 
#makeStub, adding this part 
...
                try {
                  Thread.sleep(pauseTime);
                } catch (InterruptedException e) {
                  Thread.currentThread().interrupt();
                  throw new RuntimeException(
                      "Thread was interrupted while trying to connect to 
master.", e);
                }

                // --------- added for reconnecting to ZK due to connection 
loss caused by master restart -- Begin
                if (exceptionCaught instanceof KeeperException) {
                  if (((KeeperException) exceptionCaught).code() == 
KeeperException.Code.CONNECTIONLOSS) {
                    try {
                      getKeepAliveZooKeeperWatcher().reconnectAfterExpiration();
                    } catch (Exception e) {
                      LOG.error("Encountered unexpected exception while trying "
                          + "to recover from ZooKeeper connection loss.", e);
                    }
                  }
                }
                // --------- added for reconnecting to ZK due to connection 
loss caused by master restart -- End

              } else {
                // Enough tries, we stop now
...

Then rest process could get reconnect to zk service via the same configuration 
after master restarted.
                
> [REST] HBaseAdmin does not ride over cluster restart
> ----------------------------------------------------
>
>                 Key: HBASE-8533
>                 URL: https://issues.apache.org/jira/browse/HBASE-8533
>             Project: HBase
>          Issue Type: Improvement
>          Components: REST, scripts
>    Affects Versions: 0.94.3, 0.98.0
>            Reporter: Julian Zhou
>            Priority: Minor
>             Fix For: 0.94.3, 0.98.0
>
>
> For Restful servlet (org.apache.hadoop.hbase.rest.Main (0.94), 
> org.apache.hadoop.hbase.rest.RESTServer (trunk)) on Jetty, we need to first 
> explicitly start the service (% ./bin/hbase-daemon.sh start rest -p 8000 ) 
> for application running. Here is a scenario, sometimes, HBase cluster are 
> stopped/started for maintanence, but rest is a seperated standalone process, 
> which binds the HBaseAdmin at construction method.
> HBase stop/start cause this binding lost for existing rest servlet. Rest 
> servlet still exist to trying on old bound HBaseAdmin until a long time 
> duration later with an "Unavailable" caught via an IOException caught in
> such as RootResource.
> Could we pairwise the HBase service with HBase rest service with some 
> start/stop options? since seems no reason to still keep the rest servlet 
> process after HBase stopped? When HBase restarts, original rest service could 
> not resume to bind to the new HBase service via its old HBaseAdmin reference?
> So may we stop the rest when hbase stopped, or even if hbase was killed by 
> acident, restart hbase with rest option could detect the old rest process, 
> kill it and start to bind a new one?
> From this point of view, application rely on rest api in previous scenario 
> could immediately detect it when setting up http connection session instead 
> of wasting a long time to fail back from IOException with "Unavailable" from 
> rest servlet.
> Put current options from the discussion history here from Andrew, Stack and 
> Jean-Daniel,
> 1) create an HBaseAdmin on demand in rest servlet instead of keeping 
> singleton instance; (another possible enhancement for HBase client: automatic 
> reconnection of an open HBaseAdmin handle after a cluster bounce?)
> 2) pairwise the rest webapp with hbase webui so the rest is always on with 
> HBase serive;
> 3) add an option for rest service (such as HBASE_MANAGES_REST) in 
> hbase-env.sh, set HBASE_MANAGES_REST to true, the scripts will start/stop the 
> REST server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to