[ 
https://issues.apache.org/jira/browse/HBASE-21051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21051:
-------------------------------
    Description: 
Similar with HBASE-20921, ModifyTable procedure and reopenProcedure won't held 
the lock, so another procedures like split/merge can execute at the same time.

1. a split happend during ModifyTable, as you can see from the log, the split 
was nealy complete.
{code}
2018-08-05 01:28:31,339 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1659): 
Finished subprocedure(s) of pid=772, 
state=RUNNABLE:SPLIT_TABLE_REGION_POST_OPERATION, hasLock=true; 
SplitTableRegionProce
dure table=IntegrationTestBigLinkedList, 
parent=357a7a6a62c76bc2d7ab30a6cc812637, 
daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
daughterB=5be3aadcee68d91c3d1e464865550246; resume parent processing.
2018-08-05 01:28:31,345 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1296): 
Finished pid=795, ppid=772, state=SUCCESS, hasLock=false; AssignProcedure 
table=IntegrationTestBigLinkedList, region=b13e5
d155b65a5f752f3adda78fcfb6a, target=e010125048016.bja,60020,1533402809226 in 
5.0280sec
{code}

2. reopenProcedure began to reopen region by moving it
{code}
2018-08-05 01:28:31,389 INFO  [PEWorker-11] 
procedure.MasterProcedureScheduler(631): pid=781, ppid=774, 
state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure 
hri=357a7a6a62c76bc2d7ab3
0a6cc812637, source=e010125048016.bja,60020,1533402809226, 
destination=e010125048016.bja,60020,1533402809226 checking lock on 
357a7a6a62c76bc2d7ab30a6cc812637
2018-08-05 01:28:31,390 INFO  [PEWorker-3] procedure2.ProcedureExecutor(1296): 
Finished pid=772, state=SUCCESS, hasLock=false; SplitTableRegionProcedure 
table=IntegrationTestBigLinkedList, parent=357a7
a6a62c76bc2d7ab30a6cc812637, daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
daughterB=5be3aadcee68d91c3d1e464865550246 in 21.9050sec
2018-08-05 01:28:31,518 INFO  [PEWorker-11] procedure2.ProcedureExecutor(1533): 
Initialized subprocedures=[{pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedur
e table=IntegrationTestBigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226}]
2018-08-05 01:28:31,530 INFO  [PEWorker-15] 
procedure.MasterProcedureScheduler(631): pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure 
table=IntegrationTest
BigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226 checking lock on 
357a7a6a62c76bc2d7ab30a6cc812637
{code}

3. MoveRegionProcdure fails since the region did not exis any more (due to 
split)
{code}
2018-08-05 01:28:31,543 ERROR [PEWorker-15] procedure2.ProcedureExecutor(1517): 
CODE-BUG: Uncaught runtime exception: pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; Unassig
nProcedure table=IntegrationTestBigLinkedList, 
region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226
java.lang.NullPointerException
        at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1097)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1125)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1455)
        at 
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:204)
        at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
        at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
{code}

We need to think about the case, and find a ultimately solution for it, 
otherwise, issues like this one and HBASE-20921 will keep comming.

  was:
Similar with HBASE-20921, ModifyTable procedure and reopenProcedure won't held 
the lock, so another procedures like split/merge can execute at the same time.

1. a split happend during ModifyTable, as you can see from the log, the split 
was nealy complete.
{code}
2018-08-05 01:28:31,339 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1659): 
Finished subprocedure(s) of pid=772, 
state=RUNNABLE:SPLIT_TABLE_REGION_POST_OPERATION, hasLock=true; 
SplitTableRegionProce
dure table=IntegrationTestBigLinkedList, 
parent=357a7a6a62c76bc2d7ab30a6cc812637, 
daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
daughterB=5be3aadcee68d91c3d1e464865550246; resume parent processing.
2018-08-05 01:28:31,345 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1296): 
Finished pid=795, ppid=772, state=SUCCESS, hasLock=false; AssignProcedure 
table=IntegrationTestBigLinkedList, region=b13e5
d155b65a5f752f3adda78fcfb6a, target=e010125048016.bja,60020,1533402809226 in 
5.0280sec
{code}

2. reopenProcedure began to reopen region by moving it
{code}
2018-08-05 01:28:31,389 INFO  [PEWorker-11] 
procedure.MasterProcedureScheduler(631): pid=781, ppid=774, 
state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure 
hri=357a7a6a62c76bc2d7ab3
0a6cc812637, source=e010125048016.bja,60020,1533402809226, 
destination=e010125048016.bja,60020,1533402809226 checking lock on 
357a7a6a62c76bc2d7ab30a6cc812637
2018-08-05 01:28:31,390 INFO  [PEWorker-3] procedure2.ProcedureExecutor(1296): 
Finished pid=772, state=SUCCESS, hasLock=false; SplitTableRegionProcedure 
table=IntegrationTestBigLinkedList, parent=357a7
a6a62c76bc2d7ab30a6cc812637, daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
daughterB=5be3aadcee68d91c3d1e464865550246 in 21.9050sec
2018-08-05 01:28:31,518 INFO  [PEWorker-11] procedure2.ProcedureExecutor(1533): 
Initialized subprocedures=[{pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedur
e table=IntegrationTestBigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226}]
2018-08-05 01:28:31,530 INFO  [PEWorker-15] 
procedure.MasterProcedureScheduler(631): pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure 
table=IntegrationTest
BigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226 checking lock on 
357a7a6a62c76bc2d7ab30a6cc812637
{code}

3. MoveRegionProcdure fails since the region did not exis any more (due to 
split)
{code}
2018-08-05 01:28:31,543 ERROR [PEWorker-15] procedure2.ProcedureExecutor(1517): 
CODE-BUG: Uncaught runtime exception: pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; Unassig
nProcedure table=IntegrationTestBigLinkedList, 
region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226
java.lang.NullPointerException
        at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1097)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1125)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1455)
        at 
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:204)
        at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
        at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
{code}

We need to think about the case, and find a untimely solution for it, 
otherwise, issues like this one and HBASE-20921 will keep comming.


> Possible NPE if ModifyTable and region split happen at the same time
> --------------------------------------------------------------------
>
>                 Key: HBASE-21051
>                 URL: https://issues.apache.org/jira/browse/HBASE-21051
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>    Affects Versions: 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>
> Similar with HBASE-20921, ModifyTable procedure and reopenProcedure won't 
> held the lock, so another procedures like split/merge can execute at the same 
> time.
> 1. a split happend during ModifyTable, as you can see from the log, the split 
> was nealy complete.
> {code}
> 2018-08-05 01:28:31,339 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=772, 
> state=RUNNABLE:SPLIT_TABLE_REGION_POST_OPERATION, hasLock=true; 
> SplitTableRegionProce
> dure table=IntegrationTestBigLinkedList, 
> parent=357a7a6a62c76bc2d7ab30a6cc812637, 
> daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
> daughterB=5be3aadcee68d91c3d1e464865550246; resume parent processing.
> 2018-08-05 01:28:31,345 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=795, ppid=772, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=b13e5
> d155b65a5f752f3adda78fcfb6a, target=e010125048016.bja,60020,1533402809226 in 
> 5.0280sec
> {code}
> 2. reopenProcedure began to reopen region by moving it
> {code}
> 2018-08-05 01:28:31,389 INFO  [PEWorker-11] 
> procedure.MasterProcedureScheduler(631): pid=781, ppid=774, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure 
> hri=357a7a6a62c76bc2d7ab3
> 0a6cc812637, source=e010125048016.bja,60020,1533402809226, 
> destination=e010125048016.bja,60020,1533402809226 checking lock on 
> 357a7a6a62c76bc2d7ab30a6cc812637
> 2018-08-05 01:28:31,390 INFO  [PEWorker-3] 
> procedure2.ProcedureExecutor(1296): Finished pid=772, state=SUCCESS, 
> hasLock=false; SplitTableRegionProcedure table=IntegrationTestBigLinkedList, 
> parent=357a7
> a6a62c76bc2d7ab30a6cc812637, daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
> daughterB=5be3aadcee68d91c3d1e464865550246 in 21.9050sec
> 2018-08-05 01:28:31,518 INFO  [PEWorker-11] 
> procedure2.ProcedureExecutor(1533): Initialized subprocedures=[{pid=797, 
> ppid=781, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; 
> UnassignProcedur
> e table=IntegrationTestBigLinkedList, 
> region=357a7a6a62c76bc2d7ab30a6cc812637, 
> server=e010125048016.bja,60020,1533402809226}]
> 2018-08-05 01:28:31,530 INFO  [PEWorker-15] 
> procedure.MasterProcedureScheduler(631): pid=797, ppid=781, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure 
> table=IntegrationTest
> BigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, 
> server=e010125048016.bja,60020,1533402809226 checking lock on 
> 357a7a6a62c76bc2d7ab30a6cc812637
> {code}
> 3. MoveRegionProcdure fails since the region did not exis any more (due to 
> split)
> {code}
> 2018-08-05 01:28:31,543 ERROR [PEWorker-15] 
> procedure2.ProcedureExecutor(1517): CODE-BUG: Uncaught runtime exception: 
> pid=797, ppid=781, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; 
> Unassig
> nProcedure table=IntegrationTestBigLinkedList, 
> region=357a7a6a62c76bc2d7ab30a6cc812637, 
> server=e010125048016.bja,60020,1533402809226
> java.lang.NullPointerException
>         at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1097)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1125)
>         at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1455)
>         at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:204)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
> {code}
> We need to think about the case, and find a ultimately solution for it, 
> otherwise, issues like this one and HBASE-20921 will keep comming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to