[ https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784549#comment-16784549 ]
Xiang Li commented on HBASE-20690: ---------------------------------- Hi [~xucang] I have been working on this JIRA for some days, and would like to provide some updates. # I think the patch v002 uploaded by [~andrewcheng] is the right way to go # Some potential issues are also introduced by patch v002, as described below. Patch v002 moves the following operations from postCreateTable() to preCreateTableAction() # Decide which rsgroup to go to according to table's namespace # Call RSGroupInfoManagerImpl#moveTables() to update the rsgroup information (but do not move the regions actually) The changes above also affects the procedure to create hbase:rsgroup table when HMaster starts, by triggering a race condition on cluster schema service. In HMaster#finishActiveMasterInitialization(), hbase:rsgroup is created by {code} this.balancer.initialize(); // line 1060 {code} by HMaster#createSystemTable() internally, in which, a CreateTableProcedure is scheduled. preCreateTableAction is called and the following statement is called to determine the namespace {code} String groupName = master.getClusterSchema().getNamespace(desc.getTableName().getNamespaceAsString()) .getConfigurationValue(RSGroupInfo.NAMESPACE_DESC_PROP_GROUP); {code} But getClusterSchema might return null because the cluster schema service is not ready yet. Actually, it is not ready until the following statement is called in HMaster#finishActiveMasterInitialization() {code} initClusterSchemaService(); // line 1132 {code} > Moving table to target rsgroup needs to handle TableStateNotFoundException > -------------------------------------------------------------------------- > > Key: HBASE-20690 > URL: https://issues.apache.org/jira/browse/HBASE-20690 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: Xiang Li > Priority: Major > Attachments: HBASE-20690.master.001.patch, > HBASE-20690.master.002.patch > > > This is related code: > {code} > if (targetGroup != null) { > for (TableName table: tables) { > if (master.getAssignmentManager().isTableDisabled(table)) { > LOG.debug("Skipping move regions because the table" + table + " is > disabled."); > continue; > } > {code} > In a stack trace [~rmani] showed me: > {code} > 2018-06-06 07:10:44,893 ERROR > [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] > master.TableStateManager: Unable to get table demo:tbl1 state > org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: > demo:tbl1 > at > org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193) > at > org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331) > at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131) > at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750) > at > org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {code} > The logic should take potential TableStateNotFoundException into account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)