[
https://issues.apache.org/jira/browse/HBASE-13935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592308#comment-14592308
]
Stephen Yuan Jiang commented on HBASE-13935:
--------------------------------------------
[~mbertozzi], The failed server was gone. Before the patch, it would fail if
table is either in ENABLING or ENABLED state:
{code}
if
(!assignmentManager.getTableStateManager().setTableStateIfNotInStates(tableName,
ZooKeeperProtos.Table.State.ENABLING,
ZooKeeperProtos.Table.State.ENABLING,
ZooKeeperProtos.Table.State.ENABLED)) {
throw new TableExistsException(tableName);
}
{code}
If we have an orphaned ENABLING znode, before HMaster#initNamespace() was
called, "this.assignmentManager.joinCluster();" was executed, which would call
"AssignmentManager#recoverTableInEnablingState()" to remove the ENABLING znode.
That is why my unit test only set to ENABLED and my guess is the orphaned
znode in the test probably has ENABLED znode.
[~mbertozzi] I thought this would not be a problem with PV2; however, we hit
this twice with PV2 enabled in branch-1.1 testing a couple of weeks ago
(HBASE-13815 - originally I thought the rollback had some flaw, but carefully
examined code and I think rollback is correct). I applied the same skip logic
locally and we never see this problem again in branch-1.1 testing.
> Orphaned namespace table ZK node should not prevent master to start
> -------------------------------------------------------------------
>
> Key: HBASE-13935
> URL: https://issues.apache.org/jira/browse/HBASE-13935
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 1.0.0, 0.98.13
> Reporter: Stephen Yuan Jiang
> Assignee: Stephen Yuan Jiang
> Fix For: 0.98.14, 1.0.2
>
> Attachments: HBASE-13935.v1-0.98.patch,
> HBASE-13935.v1-branch-1.0.patch
>
>
> Before we have the state-of-art Procedure V2 feature (HBASE 1.0 release or
> older), we frequently see the following issue (orphaned ZK node) that prevent
> master to start (at least in testing):
> {noformat}
> 2015-06-16 17:54:36,472 FATAL [master:10.0.0.99:60000] master.HMaster:
> Unhandled exception. Starting shutdown.
> org.apache.hadoop.hbase.TableExistsException: hbase:namespace
> at
> org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:137)
> at
> org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232)
> at
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
> at
> org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1123)
> at
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:947)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:618)
> at java.lang.Thread.run(Thread.java:745)
> 2015-06-16 17:54:36,472 INFO [master:10.0.0.99:60000] master.HMaster:
> Aborting
> {noformat}
> The above call trace is from a 0.98.x test run. We saw similar issue in
> 1.0.x run, too.
> The proposed fix is to ignore the zk node and force namespace table creation
> to be complete so that master can start successfully.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)