On 20.09.2013 14:46, Mark Thomas wrote:
> On 20/09/2013 11:02, Rainer Jung wrote:
>> Having run some sporadically failing tests in a loop under the three
>> connectors on Solaris 10 Sparc using Java 1.7.0_40. Code is trunk
>> r1524838 (current as of this morning and codewise identical to 8.0.0 RC3).
>
> Great. This is really helpful. If you could repeat the test with the
> various fixes in place to see if things have got any better that would
> be great.
I ran the test I listed in the original mails again for bio, nio and
apr, each 50 times. This time the system was idle during the test run.
Only two of the tests I mentioned still fail occasionally:
TestWebappClassLoaderExecutorMemoryLeak: About 6-10 out of 50 runs
failed. In addition I had again had one test run hanging during stop,
this time for the apr connector not the nio one.
TestCoyoteAdapter: onyl failures with apr, two runs out of 50. Details
below.
I will do some investigations and report back. See below for some detail
on the hang.
Thanks for fixing the other ones and improving this one!
>> The hang for nio in the leak detection test looks suspect (see below),
>> the other failures might be uncritical timing issues:
>
> Looking at the code, timing issues in the tests look very likely. I've
> committed a fix for the timing issues.
>
> Regarding the loop, any ideas on the root cause?
I'll investigate but the hang is not a loop. The main thread is waiting
for the main ThreadPoolExecutor lock of the timer pool when trying to
shut it down to prevent the memory leak:
"main" prio=3 tid=0x00029800 nid=0x2 waiting on condition [0xfdf7d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0xe6db92b8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at
java.util.concurrent.ThreadPoolExecutor.shutdownNow(ThreadPoolExecutor.java:1420)
at
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads(WebappClassLoader.java:2046)
at
org.apache.catalina.loader.WebappClassLoader.clearReferences(WebappClassLoader.java:1722)
at
org.apache.catalina.loader.WebappClassLoader.stop(WebappClassLoader.java:1637)
at
org.apache.catalina.loader.WebappLoader.stopInternal(WebappLoader.java:491)
at
org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
...
and three of the timer threads wait for it as well:
"pool-1-thread-1" prio=3 tid=0x00944800 nid=0x16 waiting on condition
[0xb3a7f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0xe6db92b8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:998)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
...
No indication what holds the lock or why it wasn't freed after use. The
source code of the ThreadPoolExecutor in JDK 1.7.0_40 doesn't show a
chance of not releasing the lock when leaving the method that grabs it,
it is always locked and then unlocked in a finally clause. But no thread
in the dump sits in a block which should hold the lock. I'll do some
hunting, probably by also instrumenting ThreadPoolExecutor,
ReentrantLock and/or AbstractQueuedSynchronizer.
Concerning TestCoyoteAdapter:
...
21-Sep-2013 19:43:06.801 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
["http-apr-127.0.0.1-auto-13-33900"]
21-Sep-2013 19:43:16.890 SEVERE [http-apr-127.0.0.1-auto-13-Poller]
org.apache.tomcat.util.net.AprEndpoint$Poller.run Poller failed with
error [81] : [File descriptor in bad state]
21-Sep-2013 19:43:26.900 INFO [main]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
["http-apr-127.0.0.1-auto-13-33900"]
...
...
Testcase: testPathParamsRedirect took 21.482 sec
Caused an ERROR
Unexpected end of file from server
java.net.SocketException: Unexpected end of file from server
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
at
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at
org.apache.catalina.startup.TomcatBaseTest.methodUrl(TomcatBaseTest.java:247)
at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:219)
at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:213)
at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:202)
at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:196)
at
org.apache.catalina.connector.TestCoyoteAdapter.testPath(TestCoyoteAdapter.java:137)
at
org.apache.catalina.connector.TestCoyoteAdapter.testPathParamsRedirect(TestCoyoteAdapter.java:116)
Testcase: testPathParmsFooSessionDummyValue took 1.32 sec
...
The SEVERE method is specific for the failure, also the duration above
20 seconds. The duration count (in seconds) of the 150 tests was:
Count Duration
74 0
23 1
3 2
50 3
1 20
1 21
So all test runs below 4 seconds except for the two failing runs. I'll
let that one run longer and will recheck, whether failures only happen
for apr.
Regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]