[ 
https://issues.apache.org/jira/browse/HBASE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724658#comment-17724658
 ] 

Duo Zhang commented on HBASE-27277:
-----------------------------------

Saw this in the test output

{noformat}
2023-04-07T13:52:53,904 DEBUG [RegionServerTracker-0] 
assignment.RegionRemoteProcedureBase(122): pid=10, ppid=7, state=RUNNABLE, 
hasLock=false; OpenRegionProcedure dd97971f0a037756d8b0365e8a42cda8, 
server=c17ceca693c4,45433,1680875568871 for region state=OPENING, 
location=c17ceca693c4,45433,1680875568871, table=Race, 
region=dd97971f0a037756d8b0365e8a42cda8, targetServer 
c17ceca693c4,45433,1680875568871 is dead, SCP will interrupt us, give up
{noformat}

So the TRSp will hang there and the UT will be blocked at line 154 forever.

I think this should be a test issue. Let me think how to fix it.

> TestRaceBetweenSCPAndTRSP fails in pre commit
> ---------------------------------------------
>
>                 Key: HBASE-27277
>                 URL: https://issues.apache.org/jira/browse/HBASE-27277
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>            Reporter: Duo Zhang
>            Priority: Major
>         Attachments: 
> org.apache.hadoop.hbase.master.assignment.TestRaceBetweenSCPAndTRSP-output.txt
>
>
> Seems the PE worker is stuck here. Need dig more.
> {noformat}
> "PEWorker-5" daemon prio=5 tid=326 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
>         at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
>         at 
> [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
>         at 
> [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
>         at 
> [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
>         at 
> [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
>         at 
> [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
>         at 
> app//org.apache.hadoop.hbase.master.assignment.TestRaceBetweenSCPAndTRSP$AssignmentManagerForTest.getRegionsOnServer(TestRaceBetweenSCPAndTRSP.java:97)
>         at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.getRegionsOnCrashedServer(ServerCrashProcedure.java:288)
>         at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:195)
>         at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:66)
>         at 
> app//org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
>         at 
> app//org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:919)
>         at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)
>         at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)
>         at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)
>         at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1962)
>         at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread$$Lambda$477/0x0000000800ac1840.call(Unknown
>  Source)
>         at 
> app//org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
>         at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1989)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to