[
https://issues.apache.org/jira/browse/HBASE-25381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sanjeet Nishad resolved HBASE-25381.
------------------------------------
Resolution: Won't Fix
> RegionServer ignored a procedure (closeRegionProcedure) due to duplicate pid
> which lead region to stuck in RIT.
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-25381
> URL: https://issues.apache.org/jira/browse/HBASE-25381
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.2.3
> Reporter: Sanjeet Nishad
> Priority: Major
>
> Analysis:
> 1. After Hmaster failover, master in-memory proc-id was reset.
> 2. Upon new DisableTable client request, Master dispatched a
> closeRegionProcedure to RS and suspended the proc.
> 3. But RS ignored the current CloseRegionProcedure request without doing
> anything since RS had already executed a procedure with same id.
> Since no UnAssignRegionHandler was created at Step-3, so RS did not send any
> reportRegionStateTransition to HM. And at HMaster side the procedure remain
> in suspended state because we awake the suspended procedure on
> reportRegionStateTransition. So region stuck in RIT forever until unless we
> restart HM or RS.
>
> Observed following log RS side while trying to disable table 't2':
> {code:java}
> 2020-12-08 10:18:23,216 | WARN |
> RpcServer.priority.RWQ.Fifo.read.handler=164,queue=2,port=21302 | Received
> procedure pid=13, which already executed, just ignore it |
> org.apache.hadoop.hbase.regionserver.HRegionServer.submitRegionProcedure(HRegionServer.java:4146){code}
> This pid=13 was already used by RS for opening hbase:namespace:
> {code:java}
> 2020-12-08 10:11:40,793 | INFO |
> RS_OPEN_PRIORITY_REGION-regionserver/a.b.c.d:efg-0 | Post open deploy tasks
> for hbase:namespace,,1607152197100.cffc166aa75ee4ddf8a210ca02da1ea1., pid=13,
> masterSystemTime=1607393499851 |
> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2422){code}
> So the region of table='t2' was stuck in RIT because the closeRegionProcedure
> was stuck master side indefinitely:
> {code:java}
> 2020-12-08 10:18:23,039 | INFO | PEWorker-15 | Updated tableName=t2,
> state=DISABLING in hbase:meta |
> org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1770)
> 2020-12-08 10:18:23,040 | INFO | PEWorker-15 | Set t2 to state=DISABLING |
> org.apache.hadoop.hbase.master.procedure.DisableTableProcedure.setTableStateToDisabling(DisableTableProcedure.java:296)
> 2020-12-08 10:18:23,042 | INFO | PEWorker-15 | Initialized
> subprocedures=[{pid=12, ppid=11,
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure
> table=t2, region=213e5f89d48161a93b226ba2717b14fd, UNASSIGN}] |
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1704)
> 2020-12-08 10:18:23,045 | INFO | PEWorker-2 | Took xlock for pid=12, ppid=11,
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure
> table=t2, region=213e5f89d48161a93b226ba2717b14fd, UNASSIGN |
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler.waitRegions(MasterProcedureScheduler.java:737)
> 2020-12-08 10:18:23,047 | INFO | PEWorker-2 | pid=12 updating hbase:meta
> row=213e5f89d48161a93b226ba2717b14fd, regionState=CLOSING,
> regionLocation=100-112-24-246,21302,1607392837508 |
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:217)
> 2020-12-08 10:18:23,055 | INFO | PEWorker-2 | Initialized
> subprocedures=[{pid=13, ppid=12, state=RUNNABLE; CloseRegionProcedure
> 213e5f89d48161a93b226ba2717b14fd, server=100-112-24-246,21302,1607392837508}]
> |
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1704){code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)