[ 
https://issues.apache.org/jira/browse/HBASE-20817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528122#comment-16528122
 ] 

Josh Elser commented on HBASE-20817:
------------------------------------

{quote}Clear them. Sounds like you know how to repro?
{quote}
I don't yet, actually. I have some high-level tests internally (for Apache 
Atlas) which seem to trigger these now and again, but I haven't figured out the 
secret as to what's different when we do see it.
{quote}Paste in the cycle? AMv2 should recognize stuck cycles and put the 
procedure aside for intervention rather than cycle until infinity.
{quote}
{noformat}
2018-06-27 17:16:25,270 INFO  [PEWorker-16] procedure2.ProcedureExecutor: 
Finished subprocedure(s) of pid=15024, ppid=15023, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
ReopenTableRegionsProcedure table=table_izljd; resume parent processing.
2018-06-27 17:16:25,270 INFO  [PEWorker-16] procedure2.ProcedureExecutor: 
Finished pid=320739, ppid=15024, state=SUCCESS; MoveRegionProcedure 
hri=523007f77f96474d01d74ed3d048e173, 
source=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688, 
destination=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688
 in 497msec
2018-06-27 17:16:25,279 INFO  [PEWorker-2] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=320742, ppid=15024, 
state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=523007f77f96474d01d74ed3d048e173, 
source=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688, 
destination=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688}]
2018-06-27 17:16:25,291 INFO  [PEWorker-2] procedure.MasterProcedureScheduler: 
pid=320742, ppid=15024, state=RUNNABLE:MOVE_REGION_UNASSIGN; 
MoveRegionProcedure hri=523007f77f96474d01d74ed3d048e173, 
source=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688, 
destination=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688
 checking lock on 523007f77f96474d01d74ed3d048e173
2018-06-27 17:16:25,291 INFO  [PEWorker-2] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=320743, ppid=320742, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=table_izljd, 
region=523007f77f96474d01d74ed3d048e173, 
server=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688}]
2018-06-27 17:16:25,292 INFO  [PEWorker-2] procedure.MasterProcedureScheduler: 
pid=320743, ppid=320742, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
UnassignProcedure table=table_izljd, region=523007f77f96474d01d74ed3d048e173, 
server=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688 
checking lock on 523007f77f96474d01d74ed3d048e173
2018-06-27 17:16:25,292 INFO  [PEWorker-2] assignment.RegionStateStore: 
pid=320743 updating hbase:meta row=523007f77f96474d01d74ed3d048e173, 
regionState=CLOSING
2018-06-27 17:16:25,295 INFO  [PEWorker-2] 
assignment.RegionTransitionProcedure: Dispatch pid=320743, ppid=320742, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=table_izljd, 
region=523007f77f96474d01d74ed3d048e173, 
server=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688; 
rit=CLOSING, 
location=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688
2018-06-27 17:16:25,447 INFO  [PEWorker-15] assignment.RegionStateStore: 
pid=320743 updating hbase:meta row=523007f77f96474d01d74ed3d048e173, 
regionState=CLOSED
2018-06-27 17:16:25,471 INFO  [PEWorker-15] procedure2.ProcedureExecutor: 
Finished subprocedure(s) of pid=320742, ppid=15024, 
state=RUNNABLE:MOVE_REGION_ASSIGN; MoveRegionProcedure 
hri=523007f77f96474d01d74ed3d048e173, 
source=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688, 
destination=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688;
 resume parent processing.
2018-06-27 17:16:25,471 INFO  [PEWorker-7] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=320744, ppid=320742, 
state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=table_izljd, 
region=523007f77f96474d01d74ed3d048e173, 
target=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688}]
2018-06-27 17:16:25,471 INFO  [PEWorker-15] procedure2.ProcedureExecutor: 
Finished pid=320743, ppid=320742, state=SUCCESS; UnassignProcedure 
table=table_izljd, region=523007f77f96474d01d74ed3d048e173, 
server=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688 in 
159msec
2018-06-27 17:16:25,574 INFO  [PEWorker-7] procedure.MasterProcedureScheduler: 
pid=320744, ppid=320742, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
AssignProcedure table=table_izljd, region=523007f77f96474d01d74ed3d048e173, 
target=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688 
checking lock on 523007f77f96474d01d74ed3d048e173
2018-06-27 17:16:25,576 INFO  [PEWorker-7] assignment.AssignProcedure: Starting 
pid=320744, ppid=320742, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
AssignProcedure table=table_izljd, region=523007f77f96474d01d74ed3d048e173, 
target=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688; 
rit=OFFLINE, 
location=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688; 
forceNewPlan=false, retain=false target 
svr=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688
2018-06-27 17:16:25,726 INFO  
[master/ctr-e138-1518143905142-381863-01-000008:20000] 
balancer.BaseLoadBalancer: Reassigned 1 regions. 1 retained the pre-restart 
assignment.
2018-06-27 17:16:25,726 INFO  [PEWorker-6] assignment.RegionStateStore: 
pid=320744 updating hbase:meta row=523007f77f96474d01d74ed3d048e173, 
regionState=OPENING
2018-06-27 17:16:25,729 INFO  [PEWorker-6] 
assignment.RegionTransitionProcedure: Dispatch pid=320744, ppid=320742, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=table_izljd, 
region=523007f77f96474d01d74ed3d048e173, 
target=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688; 
rit=OPENING, 
location=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688
2018-06-27 17:16:25,894 INFO  [PEWorker-12] assignment.RegionStateStore: 
pid=320744 updating hbase:meta row=523007f77f96474d01d74ed3d048e173, 
regionState=OPEN, openSeqNum=2, 
regionLocation=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688
2018-06-27 17:16:25,900 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
Finished subprocedure(s) of pid=320742, ppid=15024, state=RUNNABLE; 
MoveRegionProcedure hri=523007f77f96474d01d74ed3d048e173, 
source=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688, 
destination=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688;
 resume parent processing.
2018-06-27 17:16:25,900 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
Finished pid=320744, ppid=320742, state=SUCCESS; AssignProcedure 
table=table_izljd, region=523007f77f96474d01d74ed3d048e173, 
target=ctr-e138-1518143905142-381863-01-000002.hwx.site,16020,1530113920688 in 
425msec{noformat}
My guess is that this is hard for pv2 to catch  – the parent never gets 
resumed, it's just that we keep spawning the same child proc again and again 
(off of pid=15024)

> Infinite loop when executing ReopenTableRegionsProcedure 
> ---------------------------------------------------------
>
>                 Key: HBASE-20817
>                 URL: https://issues.apache.org/jira/browse/HBASE-20817
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Duo Zhang
>            Priority: Blocker
>             Fix For: 3.0.0, 2.1.0, 2.0.2, 2.2.0
>
>
> As discussed in HBASE-20792, it seems that a region's openSeqNum could remain 
> the same after a sucessful reopen, which causes the RTRP loop infinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to