Hi Rainer,
Thanks for finding this. It isn't something I have seen in my testing. I
think this is something that needs to be fixed before the January set of
releases.
From the stack trace, it looks like the root cause is locks being
obtained in an inconsistent order - a classic deadlock.
I haven't looked at the code or the history yet so I am not sure if this
is the direct result of a recent change or if another change has just
made this easier to trigger. I plan to look at this today.
Mark
On 01/01/2022 19:07, Rainer Jung wrote:
Hi hi,
I am running the unit tests for TC 8.5.73 plus few post-release patches
on Solaris 10 Sparc with various Java 8 JVMs. I noticed one deadlock
when running on Zulu 8.58.0.13-CA-solaris (build 1.8.0_312-b07). Maybe
it is a sporadic deadlock and could also happen on the 1.8.0 variations,
but I could not yet check that. I did not notice such a deadlock on 5
Linux distributions on which I also ran all unit tests with a variety of
JVMs, including the Zulu one.
Due to the logs the deadlock happens in
org.apache.coyote.http2.TestCancelledUpload, but
org.apache.coyote.http2.TestFlowControl runs concurrently at the same
time (zwo test threads). Test methods are testCancelledRequest rwsp.
testNotFound.
The stacks are:
Found one Java-level deadlock:
=============================
"http-nio-127.0.0.1-auto-1-exec-7":
waiting to lock monitor 0x0000000100f99508 (object
0xffffffff41a99b40, a org.apache.coyote.http2.StreamStateMachine),
which is held by "http-nio-127.0.0.1-auto-1-exec-5"
"http-nio-127.0.0.1-auto-1-exec-5":
waiting to lock monitor 0x00000001002da838 (object
0xffffffff42015548, a
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper),
which is held by "http-nio-127.0.0.1-auto-1-exec-7"
Java stack information for the threads listed above:
===================================================
"http-nio-127.0.0.1-auto-1-exec-7":
at
org.apache.coyote.http2.StreamStateMachine.checkFrameType(StreamStateMachine.java:125)
- waiting to lock <0xffffffff41a99b40> (a
org.apache.coyote.http2.StreamStateMachine)
at
org.apache.coyote.http2.AbstractNonZeroStream.checkState(AbstractNonZeroStream.java:144)
at
org.apache.coyote.http2.Http2UpgradeHandler.startRequestBodyFrame(Http2UpgradeHandler.java:1641)
at
org.apache.coyote.http2.Http2Parser.readDataFrame(Http2Parser.java:168)
at
org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:95)
at
org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:69)
at
org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:340)
at
org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:60)
at
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59)
at
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:849)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1677)
at
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
- locked <0xffffffff42015548> (a
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
at
org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
at
org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
"http-nio-127.0.0.1-auto-1-exec-5":
at
org.apache.coyote.http2.Http2UpgradeHandler.sendStreamReset(Http2UpgradeHandler.java:558)
- waiting to lock <0xffffffff42015548> (a
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
at org.apache.coyote.http2.Stream.close(Stream.java:623)
- locked <0xffffffff41a99b40> (a
org.apache.coyote.http2.StreamStateMachine)
at
org.apache.coyote.http2.StreamProcessor.process(StreamProcessor.java:85)
- locked <0xffffffff41ac4888> (a
org.apache.coyote.http2.StreamProcessor)
at
org.apache.coyote.http2.StreamRunnable.run(StreamRunnable.java:35)
at
org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
at
org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
I am attaching the detailed log from the start of the test cases until
the last line that was logged for either of the two deadlocked threads.
Note that unit testing proceeds for test thread 1 until the remaining
tests are done. Only testing on thread 2 stopps due to the deadlock.
I will kill the process now and I will see, whether it is reproducible.
The three added patches - I guess they are not responsible, but
mentioning them for the sake of completeness - are:
-
ThreadPoolExecutor_prestartAllCoreThreads-23c78507b5d3dc4c0bd36d263e4f99aa8221205c.patch
-
revert_previous_fix-BZ65714-07747b8ca36ffd29350af24d1c9fd05a174ba25d.patch
- improved_fix-BZ65714-4795df9bf89f84decafa276805d0c265f93eb368.patch
Thanks and regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org