[ 
https://issues.apache.org/jira/browse/HBASE-25381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjeet Nishad resolved HBASE-25381.
------------------------------------
    Resolution: Won't Fix

> RegionServer ignored a procedure (closeRegionProcedure) due to duplicate pid 
> which lead region to stuck in RIT.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25381
>                 URL: https://issues.apache.org/jira/browse/HBASE-25381
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.2.3
>            Reporter: Sanjeet Nishad
>            Priority: Major
>
> Analysis:
> 1. After Hmaster failover, master in-memory proc-id was reset.
> 2. Upon new DisableTable client request, Master dispatched a 
> closeRegionProcedure to RS and suspended the proc.
> 3. But RS ignored the current CloseRegionProcedure request without doing 
> anything since RS had already executed a procedure with same id.
> Since no UnAssignRegionHandler was created at Step-3, so RS did not send any 
> reportRegionStateTransition to HM. And at HMaster side the procedure remain 
> in suspended state because we awake the suspended procedure on 
> reportRegionStateTransition. So region stuck in RIT forever until unless we 
> restart HM or RS.
>  
> Observed following log RS side while trying to disable table 't2':
> {code:java}
> 2020-12-08 10:18:23,216 | WARN | 
> RpcServer.priority.RWQ.Fifo.read.handler=164,queue=2,port=21302 | Received 
> procedure pid=13, which already executed, just ignore it | 
> org.apache.hadoop.hbase.regionserver.HRegionServer.submitRegionProcedure(HRegionServer.java:4146){code}
> This pid=13 was already used by RS for opening hbase:namespace:
> {code:java}
> 2020-12-08 10:11:40,793 | INFO | 
> RS_OPEN_PRIORITY_REGION-regionserver/a.b.c.d:efg-0 | Post open deploy tasks 
> for hbase:namespace,,1607152197100.cffc166aa75ee4ddf8a210ca02da1ea1., pid=13, 
> masterSystemTime=1607393499851 | 
> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2422){code}
> So the region of table='t2' was stuck in RIT because the closeRegionProcedure 
> was stuck master side indefinitely:
> {code:java}
> 2020-12-08 10:18:23,039 | INFO | PEWorker-15 | Updated tableName=t2, 
> state=DISABLING in hbase:meta | 
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1770)
> 2020-12-08 10:18:23,040 | INFO | PEWorker-15 | Set t2 to state=DISABLING | 
> org.apache.hadoop.hbase.master.procedure.DisableTableProcedure.setTableStateToDisabling(DisableTableProcedure.java:296)
> 2020-12-08 10:18:23,042 | INFO | PEWorker-15 | Initialized 
> subprocedures=[{pid=12, ppid=11, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure 
> table=t2, region=213e5f89d48161a93b226ba2717b14fd, UNASSIGN}] | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1704)
> 2020-12-08 10:18:23,045 | INFO | PEWorker-2 | Took xlock for pid=12, ppid=11, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure 
> table=t2, region=213e5f89d48161a93b226ba2717b14fd, UNASSIGN | 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler.waitRegions(MasterProcedureScheduler.java:737)
> 2020-12-08 10:18:23,047 | INFO | PEWorker-2 | pid=12 updating hbase:meta 
> row=213e5f89d48161a93b226ba2717b14fd, regionState=CLOSING, 
> regionLocation=100-112-24-246,21302,1607392837508 | 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:217)
> 2020-12-08 10:18:23,055 | INFO | PEWorker-2 | Initialized 
> subprocedures=[{pid=13, ppid=12, state=RUNNABLE; CloseRegionProcedure 
> 213e5f89d48161a93b226ba2717b14fd, server=100-112-24-246,21302,1607392837508}] 
> | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1704){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to