[ 
https://issues.apache.org/jira/browse/HBASE-26883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512704#comment-17512704
 ] 

Duo Zhang commented on HBASE-26883:
-----------------------------------

Why C1HM@ deleted table FAVMyInfo? It is executing the TruncateTableProcedure?

> Crash HM and META-RS when truncating table causes data loss
> -----------------------------------------------------------
>
>                 Key: HBASE-26883
>                 URL: https://issues.apache.org/jira/browse/HBASE-26883
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.4.8
>            Reporter: May
>            Priority: Major
>         Attachments: hbase-root-master-C1HM1.log, 
> hbase-root-master-C1HM2.log, hbase-root-regionserver-C1RS1.log, 
> hbase-root-regionserver-C1RS2.log, hbase-root-regionserver-C1RS3.log
>
>
> I have a hbae cluster of 2 master nodes (C1HM1:172.0.0.2 and C1HM2:172.0.0.3) 
> and 3 slave nodes (C1RS1:172.0.0.4, C1RS2:172.0.0.5, C1RS3:172.0.0.6). The 
> following is the bug triggering process ({color:#de350b}red {color}for crash 
> and exception events; {color:#00875a}green {color}for events in hbase nodes):
>  * the active master is C1HM1, the meta server is C1RS2
>  * client requests to create a table "FAVMyInfo"
>  * {color:#de350b}2022-03-24 01:44:54,317 [MyTest] - INFO - Going to crash 
> node 172.25.0.5(table "FAVMyInfo" exists in the ZooKeeper at this time){color}
>  * {color:#00875a}C1HM1 assigns meta regions to C1RS3{color}
>  * client receives ACK about creating table "FAVMyInfo", and then makes some 
> changes to "FAVMyInfo"
>  * 2022-03-24 01:46:20,664 [MyTest] - INFO - Client has disabled table 
> "FAVMyInfo" and requests to truncate "FAVMyInfo"
>  * {color:#00875a}C1HM1 first deleted table "FAVMyInfo", and then create a 
> new "FAVMyInfo" region;{color}
>  * {color:#de350b}2022-03-24 01:46:23,302 [MyTest] - INFO - Going to crash 
> node 172.25.0.2 (at this time, table "FAVMyInfo" does not exist in the 
> ZooKeeper). The node C1HM1 was killed before:{color}
> {code:java}
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java),
>  
> org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55),
>  org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:534), 
> org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:529), 
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127),
>  
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108),
>  org.apache.hadoop.hbase.client.HTable.put(HTable.java:538), 
> org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1365), 
> org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1355),
>  
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1657),
>  
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1169),
>  
> org.apache.hadoop.hbase.master.TableStateManager.updateMetaState(TableStateManager.java:174),
>  
> org.apache.hadoop.hbase.master.TableStateManager.setTableState(TableStateManager.java:84),
>  
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.setEnablingState(CreateTableProcedure.java:376),
>  
> org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:140),
>  
> org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:45),
>  
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:191),
>  org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:956), 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1665),
>  
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1412),
>  
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78),
>  
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1979),
>  
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java)
>  {code}
>  * {color:#de350b}2022-03-24 01:46:46,023 [MyTest] - INFO - Going to crash 
> node 172.25.0.6. The node C1RS3 was killed before:{color}
> {code:java}
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java), 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45253),
>  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392), 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133), 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:354), 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:334), 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java)]{code}
>  * {color:#00875a}C1HM2 takes over the cluster, and unexpectedly deleted 
> table FAVMyInfo{color}
>  * 2022-03-24 01:48:13,497 [MyTest] - INFO - client receives response for 
> truncating "FAVMyInfo", no Exceptions here. Then client requests to get a row 
> from "FAVMyInfo", but gets {color:#ff0000}"TableNotFoundException: 
> FAVMyInfo"{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to