[
https://issues.apache.org/jira/browse/HBASE-21534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704339#comment-16704339
]
Duo Zhang commented on HBASE-21534:
-----------------------------------
Look at the log, it seems that we assign meta to the server which is already
marked as dead. This should be a test issue, let me provide a fix.
{noformat}
2018-11-29 20:47:49,751 INFO [PEWorker-63]
procedure.ServerCrashProcedure(131): Start pid=14,
state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure
server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,751 DEBUG [PEWorker-63] procedure2.RootProcedureState(153):
Add procedure pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS,
hasLock=true; ServerCrashProcedure server=localhost,102,1, splitWal=false,
meta=true as the 0th rollback step
2018-11-29 20:47:49,751 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1,
xlock=false sharedLock=0 size=0) to run queue because: pid=14,
state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false;
ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true released
exclusive lock
2018-11-29 20:47:49,751 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1,
xlock=false sharedLock=0 size=1) to run queue because: the exclusive lock is
not held by anyone when adding pid=14,
state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false;
ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,751 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1,
xlock=false sharedLock=0 size=0) from run queue because: queue is empty after
polling out pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false;
ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,751 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1,
xlock=true (14) sharedLock=0 size=0) from run queue because: pid=14,
state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false;
ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true held
exclusive lock
2018-11-29 20:47:49,751 DEBUG [PEWorker-63]
procedure.ServerCrashProcedure(207): Splitting meta WALs pid=14,
state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=true; ServerCrashProcedure
server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,755 INFO [PEWorker-63] master.MasterWalManager(314): Log
dir for server localhost,102,1 does not exist
2018-11-29 20:47:49,755 INFO [PEWorker-63] master.SplitLogManager(465): dead
splitlog workers [localhost,102,1]
2018-11-29 20:47:49,755 INFO [PEWorker-63] master.SplitLogManager(297):
Finished splitting (more than or equal to) 0 bytes in 0 log files in [] in 0ms
2018-11-29 20:47:49,756 DEBUG [PEWorker-63]
procedure.ServerCrashProcedure(213): Done splitting meta WALs pid=14,
state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=true; ServerCrashProcedure
server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,756 DEBUG [PEWorker-63] procedure2.RootProcedureState(153):
Add procedure pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=true;
ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true as the
1th rollback step
2018-11-29 20:47:49,756 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1,
xlock=false sharedLock=0 size=0) to run queue because: pid=14,
state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=false; ServerCrashProcedure
server=localhost,102,1, splitWal=false, meta=true released exclusive lock
2018-11-29 20:47:49,756 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1,
xlock=false sharedLock=0 size=1) to run queue because: the exclusive lock is
not held by anyone when adding pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META,
hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false,
meta=true
2018-11-29 20:47:49,756 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1,
xlock=false sharedLock=0 size=0) from run queue because: queue is empty after
polling out pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=false;
ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,756 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1,
xlock=true (14) sharedLock=0 size=0) from run queue because: pid=14,
state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=false; ServerCrashProcedure
server=localhost,102,1, splitWal=false, meta=true held exclusive lock
2018-11-29 20:47:49,758 INFO [PEWorker-63] procedure2.ProcedureExecutor(1680):
Initialized subprocedures=[{pid=15, ppid=14,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN}]
2018-11-29 20:47:49,758 DEBUG [PEWorker-63] procedure2.RootProcedureState(153):
Add procedure pid=14, state=WAITING:SERVER_CRASH_GET_REGIONS, hasLock=true;
ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true as the
2th rollback step
2018-11-29 20:47:49,758 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(351): Add TableQueue(hbase:meta, xlock=false
sharedLock=0 size=1) to run queue because: the exclusive lock is not held by
anyone when adding pid=15, ppid=14,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2018-11-29 20:47:49,758 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1,
xlock=false sharedLock=0 size=0) to run queue because: pid=14,
state=WAITING:SERVER_CRASH_GET_REGIONS, hasLock=false; ServerCrashProcedure
server=localhost,102,1, splitWal=false, meta=true released exclusive lock
2018-11-29 20:47:49,758 DEBUG [PEWorker-63]
procedure.MasterProcedureScheduler(361): Remove TableQueue(hbase:meta,
xlock=false sharedLock=0 size=0) from run queue because: queue is empty after
polling out pid=15, ppid=14,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2018-11-29 20:47:49,758 INFO [PEWorker-63]
procedure.MasterProcedureScheduler(736): Took xlock for pid=15, ppid=14,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2018-11-29 20:47:49,758 INFO [PEWorker-63]
assignment.TransitRegionStateProcedure(166): Starting pid=15, ppid=14,
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN;
rit=OPEN, location=localhost,102,1; forceNewPlan=false, retain=false
2018-11-29 20:47:49,759 DEBUG [PEWorker-63] procedure2.RootProcedureState(153):
Add procedure pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN,
hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740,
ASSIGN as the 3th rollback step
2018-11-29 20:47:49,909 INFO [mockmaster:1234]
balancer.BaseLoadBalancer(1532): Reassigned 1 regions. 1 retained the
pre-restart assignment.
2018-11-29 20:47:49,909 DEBUG [mockmaster:1234]
procedure.MasterProcedureScheduler(351): Add TableQueue(hbase:meta, xlock=false
sharedLock=1 size=1) to run queue because: pid=15, ppid=14,
state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, hasLock=true;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN has lock
2018-11-29 20:47:49,909 DEBUG [PEWorker-36]
procedure.MasterProcedureScheduler(361): Remove TableQueue(hbase:meta,
xlock=false sharedLock=1 size=0) from run queue because: queue is empty after
polling out pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN,
hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740,
ASSIGN
2018-11-29 20:47:49,910 INFO [PEWorker-36] procedure2.ProcedureExecutor(1680):
Initialized subprocedures=[{pid=16, ppid=15, state=RUNNABLE, hasLock=false;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
2018-11-29 20:47:49,910 DEBUG [PEWorker-36] procedure2.RootProcedureState(153):
Add procedure pid=15, ppid=14,
state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true;
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN as the
4th rollback step
2018-11-29 20:47:49,910 DEBUG [PEWorker-36]
procedure.MasterProcedureScheduler(351): Add TableQueue(hbase:meta, xlock=false
sharedLock=1 size=1) to run queue because: the exclusive lock is not held by
anyone when adding pid=16, ppid=15, state=RUNNABLE, hasLock=false;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
2018-11-29 20:47:49,910 DEBUG [PEWorker-36]
procedure.MasterProcedureScheduler(361): Remove TableQueue(hbase:meta,
xlock=false sharedLock=1 size=0) from run queue because: queue is empty after
polling out pid=16, ppid=15, state=RUNNABLE, hasLock=false;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
2018-11-29 20:47:49,910 DEBUG [PEWorker-36] procedure2.RootProcedureState(153):
Add procedure pid=16, ppid=15, state=RUNNABLE, hasLock=true;
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 5th
rollback step
2018-11-29 20:47:50,061 WARN [MockRSProcedureDispatcher-pool5-t10]
assignment.AssignmentManager(894): The region server localhost,102,1 is already
dead, skip reportRegionStateTransition call
{noformat}
> TestAssignmentManager is flakey
> -------------------------------
>
> Key: HBASE-21534
> URL: https://issues.apache.org/jira/browse/HBASE-21534
> Project: HBase
> Issue Type: Task
> Reporter: Duo Zhang
> Priority: Major
>
> See this in the outout and then the test hang
> {noformat}
> 2018-11-29 20:47:50,061 WARN [MockRSProcedureDispatcher-pool5-t10]
> assignment.AssignmentManager(894): The region server localhost,102,1 is
> already dead, skip reportRegionStateTransition call
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)