[ 
https://issues.apache.org/jira/browse/HBASE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625812#comment-16625812
 ] 

Duo Zhang commented on HBASE-21222:
-----------------------------------

Got it. So we need a tool in HBCK2 to handle this case.

> [amv2] Closing region on a non-existent server creates STUCK regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-21222
>                 URL: https://issues.apache.org/jira/browse/HBASE-21222
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>            Reporter: stack
>            Assignee: stack
>            Priority: Major
>
> Ran into this one where a Region had been on a server but after a bunch of 
> crashing and meddling in Master Proc WALs, any attempt at unassign has the 
> procedure fail (see below) and then report the region as STUCK.
> I broke the lock w/ new hbck2 tooling and then tried to offline again but 
> same thing happened. Bug. Fix.
> {code}
> 2018-09-22 18:36:41,900 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch 
> pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558
> 2018-09-22 18:36:41,899 INFO 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: 
> pid=138646, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=0780467efe4c5901887fb12bfa406fa7, 
> server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on 
> 0780467efe4c5901887fb12bfa406fa7
> 2018-09-22 18:36:41,900 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote 
> call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, 
> ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
> vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558
>         at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277)
>         at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983)
> 2018-09-22 18:36:41,903 WARN 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring 
> vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to