[
https://issues.apache.org/jira/browse/HBASE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724658#comment-17724658
]
Duo Zhang commented on HBASE-27277:
-----------------------------------
Saw this in the test output
{noformat}
2023-04-07T13:52:53,904 DEBUG [RegionServerTracker-0]
assignment.RegionRemoteProcedureBase(122): pid=10, ppid=7, state=RUNNABLE,
hasLock=false; OpenRegionProcedure dd97971f0a037756d8b0365e8a42cda8,
server=c17ceca693c4,45433,1680875568871 for region state=OPENING,
location=c17ceca693c4,45433,1680875568871, table=Race,
region=dd97971f0a037756d8b0365e8a42cda8, targetServer
c17ceca693c4,45433,1680875568871 is dead, SCP will interrupt us, give up
{noformat}
So the TRSp will hang there and the UT will be blocked at line 154 forever.
I think this should be a test issue. Let me think how to fix it.
> TestRaceBetweenSCPAndTRSP fails in pre commit
> ---------------------------------------------
>
> Key: HBASE-27277
> URL: https://issues.apache.org/jira/browse/HBASE-27277
> Project: HBase
> Issue Type: Bug
> Components: proc-v2
> Reporter: Duo Zhang
> Priority: Major
> Attachments:
> org.apache.hadoop.hbase.master.assignment.TestRaceBetweenSCPAndTRSP-output.txt
>
>
> Seems the PE worker is stuck here. Need dig more.
> {noformat}
> "PEWorker-5" daemon prio=5 tid=326 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
> at
> [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
> at
> [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
> at
> [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
> at
> [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
> at
> [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
> at
> app//org.apache.hadoop.hbase.master.assignment.TestRaceBetweenSCPAndTRSP$AssignmentManagerForTest.getRegionsOnServer(TestRaceBetweenSCPAndTRSP.java:97)
> at
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.getRegionsOnCrashedServer(ServerCrashProcedure.java:288)
> at
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:195)
> at
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:66)
> at
> app//org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
> at
> app//org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:919)
> at
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)
> at
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)
> at
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)
> at
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1962)
> at
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread$$Lambda$477/0x0000000800ac1840.call(Unknown
> Source)
> at
> app//org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
> at
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1989)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)