[
https://issues.apache.org/jira/browse/HBASE-26883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512704#comment-17512704
]
Duo Zhang commented on HBASE-26883:
-----------------------------------
Why C1HM@ deleted table FAVMyInfo? It is executing the TruncateTableProcedure?
> Crash HM and META-RS when truncating table causes data loss
> -----------------------------------------------------------
>
> Key: HBASE-26883
> URL: https://issues.apache.org/jira/browse/HBASE-26883
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.4.8
> Reporter: May
> Priority: Major
> Attachments: hbase-root-master-C1HM1.log,
> hbase-root-master-C1HM2.log, hbase-root-regionserver-C1RS1.log,
> hbase-root-regionserver-C1RS2.log, hbase-root-regionserver-C1RS3.log
>
>
> I have a hbae cluster of 2 master nodes (C1HM1:172.0.0.2 and C1HM2:172.0.0.3)
> and 3 slave nodes (C1RS1:172.0.0.4, C1RS2:172.0.0.5, C1RS3:172.0.0.6). The
> following is the bug triggering process ({color:#de350b}red {color}for crash
> and exception events; {color:#00875a}green {color}for events in hbase nodes):
> * the active master is C1HM1, the meta server is C1RS2
> * client requests to create a table "FAVMyInfo"
> * {color:#de350b}2022-03-24 01:44:54,317 [MyTest] - INFO - Going to crash
> node 172.25.0.5(table "FAVMyInfo" exists in the ZooKeeper at this time){color}
> * {color:#00875a}C1HM1 assigns meta regions to C1RS3{color}
> * client receives ACK about creating table "FAVMyInfo", and then makes some
> changes to "FAVMyInfo"
> * 2022-03-24 01:46:20,664 [MyTest] - INFO - Client has disabled table
> "FAVMyInfo" and requests to truncate "FAVMyInfo"
> * {color:#00875a}C1HM1 first deleted table "FAVMyInfo", and then create a
> new "FAVMyInfo" region;{color}
> * {color:#de350b}2022-03-24 01:46:23,302 [MyTest] - INFO - Going to crash
> node 172.25.0.2 (at this time, table "FAVMyInfo" does not exist in the
> ZooKeeper). The node C1HM1 was killed before:{color}
> {code:java}
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java),
>
> org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55),
> org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:534),
> org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:529),
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127),
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108),
> org.apache.hadoop.hbase.client.HTable.put(HTable.java:538),
> org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1365),
> org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1355),
>
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1657),
>
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1169),
>
> org.apache.hadoop.hbase.master.TableStateManager.updateMetaState(TableStateManager.java:174),
>
> org.apache.hadoop.hbase.master.TableStateManager.setTableState(TableStateManager.java:84),
>
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.setEnablingState(CreateTableProcedure.java:376),
>
> org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:140),
>
> org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:45),
>
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:191),
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:956),
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1665),
>
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1412),
>
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78),
>
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1979),
>
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java)
> {code}
> * {color:#de350b}2022-03-24 01:46:46,023 [MyTest] - INFO - Going to crash
> node 172.25.0.6. The node C1RS3 was killed before:{color}
> {code:java}
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java),
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45253),
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392),
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133),
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:354),
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:334),
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java)]{code}
> * {color:#00875a}C1HM2 takes over the cluster, and unexpectedly deleted
> table FAVMyInfo{color}
> * 2022-03-24 01:48:13,497 [MyTest] - INFO - client receives response for
> truncating "FAVMyInfo", no Exceptions here. Then client requests to get a row
> from "FAVMyInfo", but gets {color:#ff0000}"TableNotFoundException:
> FAVMyInfo"{color}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)