[ 
https://issues.apache.org/jira/browse/HBASE-20817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878622#comment-16878622
 ] 

Jacobo Coll commented on HBASE-20817:
-------------------------------------

Hi team,

 

I'm not sure if I should open a new ticket or not.

I am having this same issue in a Hortonworks HBase 
[2.0.2.3.1.2.1-1|https://repo.hortonworks.com/content/repositories/releases/org/apache/hbase/hbase-server/2.0.2.3.1.2.1-1/]
 , where it should be fixed (I've checked that this patch was applied to that 
build)

Just after creating a "view" in phoenix over an existing table, the 
"ModifyTableProcedure" triggers a "ReopenTableRegionsProcedure" that enters 
into this infinite loop of "MoveRegionProcedure". This loop has a lapse of ~5s, 
and it fills up the list of procedures, and the procedure wal is not cleanup, 
as it never finishes the running procedure.

Please, find here a selected portion of the hbase-master log. The affected 
table has a pre-split of 100, so the log is quite large. I've shrunken some 
lines with dots.

 
{noformat}
2019-07-03 16:12:27,924 INFO [PEWorker-8] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=267, ppid=266, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_GET_REGIONS; ReopenTableRegionsProcedure 
table=opencga_jcoll_grch38_variants}]
2019-07-03 16:12:28,059 INFO [PEWorker-2] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=268, ppid=267, 
state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=fdad9893526ef840d117e6bea7c04bc5, 
source=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960,
 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960},
 {pid=269, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=7b8b7dc99aee4f524af41a86e10ac945, 
source=wn0-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131867,
 
destination=wn0-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131867},
 
....................................................................................................,
 {pid=368, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=60ccd4513bc298b83d062cb0172ccba9, 
source=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960,
 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960}]
2019-07-03 16:12:28,096 INFO  [PEWorker-5] procedure.MasterProcedureScheduler: 
Took xlock for pid=268, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; 
MoveRegionProcedure hri=fdad9893526ef840d117e6bea7c04bc5, 
source=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960,
 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:28,116 INFO [PEWorker-5] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=370, ppid=268, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
table=opencga_jcoll_grch38_variants, region=fdad9893526ef840d117e6bea7c04bc5, 
override=true, 
server=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960}]
2019-07-03 16:12:28,247 INFO [PEWorker-4] procedure.MasterProcedureScheduler: 
Took xlock for pid=370, ppid=268, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
UnassignProcedure table=opencga_jcoll_grch38_variants, 
region=fdad9893526ef840d117e6bea7c04bc5, override=true, 
server=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:28,280 INFO [PEWorker-4] assignment.RegionTransitionProcedure: 
Dispatch pid=370, ppid=268, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
locked=true; UnassignProcedure table=opencga_jcoll_grch38_variants, 
region=fdad9893526ef840d117e6bea7c04bc5, override=true, 
server=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:28,659 INFO [PEWorker-13] procedure2.ProcedureExecutor: 
Finished subprocedure pid=370, resume processing parent pid=268, ppid=267, 
state=RUNNABLE:MOVE_REGION_ASSIGN, locked=true; MoveRegionProcedure 
hri=fdad9893526ef840d117e6bea7c04bc5, 
source=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960,
 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:28,659 INFO [PEWorker-13] procedure2.ProcedureExecutor: 
Finished pid=370, ppid=268, state=SUCCESS; UnassignProcedure 
table=opencga_jcoll_grch38_variants, region=fdad9893526ef840d117e6bea7c04bc5, 
override=true, 
server=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
 in 458msec, unfinishedSiblingCount=0
2019-07-03 16:12:28,662 INFO [PEWorker-8] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=408, ppid=268, 
state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=opencga_jcoll_grch38_variants, region=fdad9893526ef840d117e6bea7c04bc5, 
target=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960}]
2019-07-03 16:12:28,687 INFO [PEWorker-8] procedure.MasterProcedureScheduler: 
Took xlock for pid=408, ppid=268, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
AssignProcedure table=opencga_jcoll_grch38_variants, 
region=fdad9893526ef840d117e6bea7c04bc5, 
target=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:28,713 INFO [PEWorker-8] assignment.AssignProcedure: Starting 
pid=408, ppid=268, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; 
AssignProcedure table=opencga_jcoll_grch38_variants, 
region=fdad9893526ef840d117e6bea7c04bc5, 
target=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960;
 rit=OFFLINE, 
location=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960;
 forceNewPlan=false, retain=false target 
svr=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:28,909 INFO [PEWorker-13] 
assignment.RegionTransitionProcedure: Dispatch pid=408, ppid=268, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; AssignProcedure 
table=opencga_jcoll_grch38_variants, region=fdad9893526ef840d117e6bea7c04bc5, 
target=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:29,435 INFO [PEWorker-13] procedure2.ProcedureExecutor: 
Finished subprocedure pid=408, resume processing parent pid=268, ppid=267, 
state=RUNNABLE, locked=true; MoveRegionProcedure 
hri=fdad9893526ef840d117e6bea7c04bc5, 
source=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960,
 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
2019-07-03 16:12:29,436 INFO [PEWorker-13] procedure2.ProcedureExecutor: 
Finished pid=408, ppid=268, state=SUCCESS; AssignProcedure 
table=opencga_jcoll_grch38_variants, region=fdad9893526ef840d117e6bea7c04bc5, 
target=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
 in 684msec, unfinishedSiblingCount=0
2019-07-03 16:12:29,494 INFO [PEWorker-14] procedure2.ProcedureExecutor: 
Finished pid=268, ppid=267, state=SUCCESS; MoveRegionProcedure 
hri=fdad9893526ef840d117e6bea7c04bc5, 
source=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960,
 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960
 in 1.3930sec, unfinishedSiblingCount=92
2019-07-03 16:12:36,744 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
Finished subprocedure pid=275, resume processing parent pid=267, ppid=266, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
ReopenTableRegionsProcedure table=opencga_jcoll_grch38_variants
2019-07-03 16:12:36,744 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
Finished pid=275, ppid=267, state=SUCCESS; MoveRegionProcedure 
hri=f552eccd01cfd00bc30bec5e19f398df, 
source=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131806,
 
destination=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131806
 in 8.4340sec, unfinishedSiblingCount=0
2019-07-03 16:12:36,744 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
Finished subprocedure pid=275, resume processing parent pid=267, ppid=266, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
ReopenTableRegionsProcedure table=o
pencga_jcoll_grch38_variants
2019-07-03 16:12:36,744 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
Finished pid=275, ppid=267, state=SUCCESS; MoveRegionProcedure 
hri=f552eccd01cfd00bc30bec5e19f398df, 
source=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.clouda
pp.net,16020,1562169131806, 
destination=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131806
 in 8.4340sec, unfinishedSiblingCount=0
2019-07-03 16:12:36,791 INFO  [PEWorker-5] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=571, ppid=267, 
state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=fdad9893526ef840d117e6bea7c04bc5, source=wn1-opencg.5w3ff
4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960, 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960},
 {pid=572, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProced
ure hri=7b8b7dc99aee4f524af41a86e10ac945, 
source=wn0-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131867,
 
destination=wn0-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131867},
 {pid=573
, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=545caf13911c04263c8f84f2c14783b7, 
source=wn4-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131742,
 
destination=wn4-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131742},
 {pid=574, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=6fd4397428a741d0fa67e1a2774f48d1, 
source=wn3-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131711,
 
destination=wn3-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131711},
 {pid=575, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=053f3ff2a77982f98bb399d60aa0942b, 
source=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131806,
 
destination=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131806},
 {pid=576, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=94415a23ead3e24367c12a0de1e90e28, 
source=wn4-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131742,
 
destination=wn4-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131742},
 {pid=577, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=eacf81f79287d01f721a352407d5a1a5, 
source=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131806,
 
destination=wn2-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131806},
 
..................................................................................................................................,
 {pid=670, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=941c5d8178257b7fc6bfa76b7d760468, 
source=wn4-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131742,
 
destination=wn4-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131742},
 {pid=671, ppid=267, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
hri=60ccd4513bc298b83d062cb0172ccba9, 
source=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960,
 
destination=wn1-opencg.5w3ff4rocu0e1dpkokmkmgo5ib.zx.internal.cloudapp.net,16020,1562169131960}]

{noformat}
And then, the loop starts over

The ReopenTableRegionsProcedure is stuck at 
REOPEN_TABLE_REGIONS_CONFIRM_REOPENED, where it starts over again, so, somehow, 
this should be relaed with HBASE-20752

> Infinite loop when executing ReopenTableRegionsProcedure 
> ---------------------------------------------------------
>
>                 Key: HBASE-20817
>                 URL: https://issues.apache.org/jira/browse/HBASE-20817
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: Duo Zhang
>            Assignee: Ankit Singhal
>            Priority: Blocker
>             Fix For: 3.0.0, 2.1.0, 2.0.2
>
>         Attachments: HBASE-20817-v1.patch, HBASE-20817.patch
>
>
> As discussed in HBASE-20792, it seems that a region's openSeqNum could remain 
> the same after a sucessful reopen, which causes the RTRP loop infinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to