[ 
https://issues.apache.org/jira/browse/HBASE-26420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457964#comment-17457964
 ] 

May edited comment on HBASE-26420 at 12/12/21, 1:31 PM:
--------------------------------------------------------

*Root cause*:

when current HMaster C3HM1 assigns meta to C3RS2 in startup process, C3RS2 
submits a OpenMetaHandler to assign the meta:

{code:java}
[org.apache.hadoop.hbase.executor.ExecutorService$Executor.submit(ExecutorService.java),
 
org.apache.hadoop.hbase.executor.ExecutorService.submit(ExecutorService.java:149),
 
org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1935),
 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:26662),
 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2423), 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124), 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311), 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)]
{code}

Then C3RS2 returns the response to C3HM1. When C3RS2 crashes before 
transitioning the meta region to OPENED, and modify its state in ZooKeeper, 
C3HM1 will be stuck for the state change of the meta region from ZooKeeper: 
"/hbase/region-in-transition/1588230740"



was (Author: willtoshare):
*Root cause*:

when current HMaster C3HM1 assigns meta to C3RS2 in startup process, C3RS2 
submits a OpenMetaHandler to assign the meta:

{code:java}
[org.apache.hadoop.hbase.executor.ExecutorService$Executor.submit(ExecutorService.java),
 
org.apache.hadoop.hbase.executor.ExecutorService.submit(ExecutorService.java:149),
 
org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1935),
 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:26662),
 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2423), 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124), 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311), 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)]
{code}

Then C3RS2 returns the response to C3HM1. When C3RS2 crashes before 
transitioning the meta region to OPENED, and modify its state in ZooKeeper, 
C3HM1 will stuck and wait for the state change of meta region's znode: 
"/hbase/region-in-transition/1588230740"


> Unexpected crash of meta RegionServer causes the cluster out of service
> -----------------------------------------------------------------------
>
>                 Key: HBASE-26420
>                 URL: https://issues.apache.org/jira/browse/HBASE-26420
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>            Reporter: May
>            Priority: Major
>         Attachments: hbase-root-master-C3HM1.log
>
>
> We have a cluster of two HMasters, C3HM1 and C3HM2, and three RegionServers, 
> C3RS1, C3RS2, C3RS3. 
> We use an external ZooKeeper cluster which is a pseudo-distributed cluster:
> {code:java}
>   <property>
>     <name>hbase.zookeeper.quorum</name>
>     <value>C3hb-zk</value>
>   </property>
>   <property>
>     <name>hbase.zookeeper.property.clientPort</name>
>     <value>11181</value>
>   </property>
> {code}
> For other HBase options, we use the default settings. The buggy scenario is 
> as follows:
> 1. Start the cluster, C3HM1 becomes the active master;
> 2. C3RS2 crashes right before creating the znode "/hbase/meta-region-server" 
> on ZooKeeper;
> {code:java}
> [org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:665),
>  
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:644),
>  org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:1182), 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.setMetaLocation(MetaTableLocator.java:464),
>  
> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2182),
>  
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:329)]
> {code}
> 3. The meta server is still not online after 10 minutes. The data of znode 
> "/hbase/master" is C3HM1.
> And the bug does not appear on HBase-2.4.5.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to