lhotari opened a new issue #10767:
URL: https://github.com/apache/pulsar/issues/10767


   **Describe the bug**
   
   There's a deadlock issue in Pulsar Client in master branch. A PR test run 
had stalled and the thread dump detected this deadlock issue:
   
   ```
   Found one Java-level deadlock:
   =============================
   "pulsar-timer-462-1":
     waiting to lock monitor 0x00007fce080ad180 (object 0x00000000c6094a00, a 
org.apache.pulsar.client.impl.ConsumerImpl),
     which is held by "pulsar-client-internal-459-1"
   "pulsar-client-internal-459-1":
     waiting for ownable synchronizer 0x00000000c6094bf0, (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
     which is held by "pulsar-timer-462-1"
   
   Java stack information for the threads listed above:
   ===================================================
   "pulsar-timer-462-1":
           at 
org.apache.pulsar.client.impl.ConsumerImpl.redeliverUnacknowledgedMessages(ConsumerImpl.java:1578)
           - waiting to lock <0x00000000c6094a00> (a 
org.apache.pulsar.client.impl.ConsumerImpl)
           at 
org.apache.pulsar.client.impl.ConsumerImpl.redeliverUnacknowledgedMessages(ConsumerImpl.java:1619)
           at 
org.apache.pulsar.client.impl.UnAckedMessageTracker$2.run(UnAckedMessageTracker.java:145)
           at 
io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
           at 
io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
           at 
io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
           at java.lang.Thread.run([email protected]/Thread.java:829)
   "pulsar-client-internal-459-1":
           at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
           - parking to wait for  <0x00000000c6094bf0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
           at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
           at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
           at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock([email protected]/ReentrantReadWriteLock.java:959)
           at 
org.apache.pulsar.client.impl.UnAckedMessageTracker.add(UnAckedMessageTracker.java:180)
           at 
org.apache.pulsar.client.impl.ConsumerImpl.trackMessage(ConsumerImpl.java:1385)
           at 
org.apache.pulsar.client.impl.ConsumerImpl.trackMessage(ConsumerImpl.java:1369)
           at 
org.apache.pulsar.client.impl.ConsumerImpl.messageProcessed(ConsumerImpl.java:1362)
           - locked <0x00000000c6094a00> (a 
org.apache.pulsar.client.impl.ConsumerImpl)
           at 
org.apache.pulsar.client.impl.ConsumerImpl.lambda$internalBatchReceiveAsync$5(ConsumerImpl.java:483)
           at 
org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$1271/0x0000000100ac0c40.run(Unknown
 Source)
           at 
java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:515)
           at 
java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:304)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
           at java.lang.Thread.run([email protected]/Thread.java:829)
   
   Found 1 deadlock.
   ```
   
   Full thread dump: 
https://gist.github.com/lhotari/1bbcc43e850bd7d62891ba7fe3724b0b
   thread dump in jstack.review UI: 
https://jstack.review/?https://gist.github.com/lhotari/1bbcc43e850bd7d62891ba7fe3724b0b#tda_1_dump
   
   The test that was executing was ConsumerBatchReceiveTest:
   ```
   "main" #1 prio=5 os_prio=0 cpu=13468.61ms elapsed=6534.24s 
tid=0x00007fce64027800 nid=0xca9 in Object.wait()  [0x00007fce6a11e000]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
           at java.lang.Object.wait([email protected]/Native Method)
           - waiting on <no object reference available>
           at java.lang.Thread.join([email protected]/Thread.java:1308)
           - waiting to re-lock in wait() <0x00000000c4246738> (a 
io.netty.util.concurrent.FastThreadLocalThread)
           at io.netty.util.HashedWheelTimer.stop(HashedWheelTimer.java:383)
           at 
org.apache.pulsar.client.impl.PulsarClientImpl.shutdown(PulsarClientImpl.java:730)
           at 
org.apache.pulsar.broker.auth.MockedPulsarServiceBaseTest.internalCleanup(MockedPulsarServiceBaseTest.java:192)
           at 
org.apache.pulsar.client.api.ConsumerBatchReceiveTest.cleanup(ConsumerBatchReceiveTest.java:48)
           at 
org.apache.pulsar.tests.TestRetrySupport.stateCheck(TestRetrySupport.java:52)
           at jdk.internal.reflect.GeneratedMethodAccessor129.invoke(Unknown 
Source)
           at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke([email protected]/Method.java:566)
           at 
org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
           at 
org.testng.internal.MethodInvocationHelper.invokeMethodConsideringTimeout(MethodInvocationHelper.java:61)
           at 
org.testng.internal.ConfigInvoker.invokeConfigurationMethod(ConfigInvoker.java:366)
           at 
org.testng.internal.ConfigInvoker.invokeConfigurations(ConfigInvoker.java:320)
           at 
org.testng.internal.TestInvoker.runConfigMethods(TestInvoker.java:701)
           at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:527)
           at org.testng.internal.TestInvoker.retryFailed(TestInvoker.java:214)
           at 
org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:58)
           at 
org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
           at 
org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
           at 
org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
           at 
org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
           at 
org.testng.TestRunner$$Lambda$219/0x0000000100448c40.accept(Unknown Source)
           at java.util.ArrayList.forEach([email protected]/ArrayList.java:1541)
           at org.testng.TestRunner.privateRun(TestRunner.java:764)
           at org.testng.TestRunner.run(TestRunner.java:585)
           at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
           at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
           at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
           at org.testng.SuiteRunner.run(SuiteRunner.java:286)
           at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
           at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
           at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
           at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
           at org.testng.TestNG.runSuites(TestNG.java:1069)
           at org.testng.TestNG.run(TestNG.java:1037)
           at 
org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:135)
           at 
org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeSingleClass(TestNGDirectoryTestSuite.java:112)
           at 
org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeLazy(TestNGDirectoryTestSuite.java:123)
           at 
org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:90)
           at 
org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:146)
           at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
           at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
           at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
           at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
   ```
   
   **Expected behavior**
   
   Pulsar Client shouldn't deadlock.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to