On 20.09.2013 14:46, Mark Thomas wrote: > On 20/09/2013 11:02, Rainer Jung wrote: >> Having run some sporadically failing tests in a loop under the three >> connectors on Solaris 10 Sparc using Java 1.7.0_40. Code is trunk >> r1524838 (current as of this morning and codewise identical to 8.0.0 RC3). > > Great. This is really helpful. If you could repeat the test with the > various fixes in place to see if things have got any better that would > be great.
I ran the test I listed in the original mails again for bio, nio and apr, each 50 times. This time the system was idle during the test run. Only two of the tests I mentioned still fail occasionally: TestWebappClassLoaderExecutorMemoryLeak: About 6-10 out of 50 runs failed. In addition I had again had one test run hanging during stop, this time for the apr connector not the nio one. TestCoyoteAdapter: onyl failures with apr, two runs out of 50. Details below. I will do some investigations and report back. See below for some detail on the hang. Thanks for fixing the other ones and improving this one! >> The hang for nio in the leak detection test looks suspect (see below), >> the other failures might be uncritical timing issues: > > Looking at the code, timing issues in the tests look very likely. I've > committed a fix for the timing issues. > > Regarding the loop, any ideas on the root cause? I'll investigate but the hang is not a loop. The main thread is waiting for the main ThreadPoolExecutor lock of the timer pool when trying to shut it down to prevent the memory leak: "main" prio=3 tid=0x00029800 nid=0x2 waiting on condition [0xfdf7d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xe6db92b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at java.util.concurrent.ThreadPoolExecutor.shutdownNow(ThreadPoolExecutor.java:1420) at org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads(WebappClassLoader.java:2046) at org.apache.catalina.loader.WebappClassLoader.clearReferences(WebappClassLoader.java:1722) at org.apache.catalina.loader.WebappClassLoader.stop(WebappClassLoader.java:1637) at org.apache.catalina.loader.WebappLoader.stopInternal(WebappLoader.java:491) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) ... and three of the timer threads wait for it as well: "pool-1-thread-1" prio=3 tid=0x00944800 nid=0x16 waiting on condition [0xb3a7f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xe6db92b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:998) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) ... No indication what holds the lock or why it wasn't freed after use. The source code of the ThreadPoolExecutor in JDK 1.7.0_40 doesn't show a chance of not releasing the lock when leaving the method that grabs it, it is always locked and then unlocked in a finally clause. But no thread in the dump sits in a block which should hold the lock. I'll do some hunting, probably by also instrumenting ThreadPoolExecutor, ReentrantLock and/or AbstractQueuedSynchronizer. Concerning TestCoyoteAdapter: ... 21-Sep-2013 19:43:06.801 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-apr-127.0.0.1-auto-13-33900"] 21-Sep-2013 19:43:16.890 SEVERE [http-apr-127.0.0.1-auto-13-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Poller failed with error [81] : [File descriptor in bad state] 21-Sep-2013 19:43:26.900 INFO [main] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-apr-127.0.0.1-auto-13-33900"] ... ... Testcase: testPathParamsRedirect took 21.482 sec Caused an ERROR Unexpected end of file from server java.net.SocketException: Unexpected end of file from server at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) at org.apache.catalina.startup.TomcatBaseTest.methodUrl(TomcatBaseTest.java:247) at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:219) at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:213) at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:202) at org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:196) at org.apache.catalina.connector.TestCoyoteAdapter.testPath(TestCoyoteAdapter.java:137) at org.apache.catalina.connector.TestCoyoteAdapter.testPathParamsRedirect(TestCoyoteAdapter.java:116) Testcase: testPathParmsFooSessionDummyValue took 1.32 sec ... The SEVERE method is specific for the failure, also the duration above 20 seconds. The duration count (in seconds) of the 150 tests was: Count Duration 74 0 23 1 3 2 50 3 1 20 1 21 So all test runs below 4 seconds except for the two failing runs. I'll let that one run longer and will recheck, whether failures only happen for apr. Regards, Rainer --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org