[
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631395#comment-16631395
]
stack commented on HBASE-21213:
-------------------------------
bq. It doesn't seem right, if the children procedures were be bypassed
correctly, its parent should not stuck. Is it because some of the Procedure
WALs being deleted?
Agree it doesn't seem right but the logs have rolled away (although I suppose I
could see if in archive...) and I'm in this 'situation' that I need to fix.
I like the idea of recursive. Let me work on it.
> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> -----------------------------------------------------------------------
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
> Issue Type: Bug
> Components: amv2, hbck2
> Reporter: stack
> Assignee: stack
> Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch,
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch,
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch,
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch,
> HBASE-21213.branch-2.1.007.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality.
> On bypass, there is more state to be cleared if we are allow new Procedures
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure:
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true,
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace,
> region=37cc206fe9c4bc1c0a46a34c5f523d16,
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null
> to finish it
> 2018-09-20 05:45:44,022 INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449,
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace,
> region=37cc206fe9c4bc1c0a46a34c5f523d16,
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is
> already another procedure running on this region this=pid=100450,
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16,
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450,
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING,
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450,
> state=ROLLEDBACK,
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException:
> There is already another procedure running on this region this=pid=100450,
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16,
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists
> still in RegionStateNodes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)