stack created HBASE-21222:
-----------------------------
Summary: [amv2] Closing region on a non-existent server creates
STUCK regions
Key: HBASE-21222
URL: https://issues.apache.org/jira/browse/HBASE-21222
Project: HBase
Issue Type: Bug
Components: amv2
Reporter: stack
Assignee: stack
Ran into this one where a Region had been on a server but after a bunch of
crashing and meddling in Master Proc WALs, any attempt at unassign has the
procedure fail (see below) and then report the region as STUCK.
I broke the lock w/ new hbck2 tooling and then tried to offline again but same
thing happened. Bug. Fix.
{code}
2018-09-22 18:36:41,900 INFO
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch
pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH,
locked=true; UnassignProcedure
table=IntegrationTestBigLinkedList_20180614072614,
region=51cdade76ca7217ec191f39e5f56c61c,
server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING,
location=vd0637.halxg.cloudera.com,22101,1537397969558
2018-09-22 18:36:41,899 INFO
org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: pid=138646,
ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure
table=IntegrationTestBigLinkedList_20180614072614,
region=0780467efe4c5901887fb12bfa406fa7,
server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on
0780467efe4c5901887fb12bfa406fa7
2018-09-22 18:36:41,900 WARN
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote
call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650,
ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true;
UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614,
region=51cdade76ca7217ec191f39e5f56c61c,
server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING,
location=vd0637.halxg.cloudera.com,22101,1537397969558;
exception=NoServerDispatchException
org.apache.hadoop.hbase.procedure2.NoServerDispatchException:
vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871,
state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure
table=IntegrationTestBigLinkedList_20180614072614,
region=51cdade76ca7217ec191f39e5f56c61c,
server=vd0637.halxg.cloudera.com,22101,1537397969558
at
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277)
at
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983)
2018-09-22 18:36:41,903 WARN
org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring
vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871,
state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure
table=IntegrationTestBigLinkedList_20180614072614,
region=51cdade76ca7217ec191f39e5f56c61c,
server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING,
location=vd0637.halxg.cloudera.com,22101,1537397969558;
exception=NoServerDispatchException
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)