[
https://issues.apache.org/jira/browse/HBASE-26420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457965#comment-17457965
]
May commented on HBASE-26420:
-----------------------------
[~shahrs87]
Hello, could you help to confirm the root cause of this bug? Thanks.
> Unexpected crash of meta RegionServer causes the cluster out of service
> -----------------------------------------------------------------------
>
> Key: HBASE-26420
> URL: https://issues.apache.org/jira/browse/HBASE-26420
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.7.1
> Reporter: May
> Priority: Major
> Attachments: hbase-root-master-C3HM1.log
>
>
> We have a cluster of two HMasters, C3HM1 and C3HM2, and three RegionServers,
> C3RS1, C3RS2, C3RS3.
> We use an external ZooKeeper cluster which is a pseudo-distributed cluster:
> {code:java}
> <property>
> <name>hbase.zookeeper.quorum</name>
> <value>C3hb-zk</value>
> </property>
> <property>
> <name>hbase.zookeeper.property.clientPort</name>
> <value>11181</value>
> </property>
> {code}
> For other HBase options, we use the default settings. The buggy scenario is
> as follows:
> 1. Start the cluster, C3HM1 becomes the active master;
> 2. C3RS2 crashes right before creating the znode "/hbase/meta-region-server"
> on ZooKeeper;
> {code:java}
> [org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:665),
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:644),
> org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:1182),
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.setMetaLocation(MetaTableLocator.java:464),
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2182),
>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:329)]
> {code}
> 3. The meta server is still not online after 10 minutes. The data of znode
> "/hbase/master" is C3HM1.
> And the bug does not appear on HBase-2.4.5.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)