On 20.09.2013 14:46, Mark Thomas wrote:
> On 20/09/2013 11:02, Rainer Jung wrote:
>> Having run some sporadically failing tests in a loop under the three
>> connectors on Solaris 10 Sparc using Java 1.7.0_40. Code is trunk
>> r1524838 (current as of this morning and codewise identical to 8.0.0 RC3).
> 
> Great. This is really helpful. If you could repeat the test with the
> various fixes in place to see if things have got any better that would
> be great.

I ran the test I listed in the original mails again for bio, nio and
apr, each 50 times. This time the system was idle during the test run.

Only two of the tests I mentioned still fail occasionally:

TestWebappClassLoaderExecutorMemoryLeak: About 6-10 out of 50 runs
failed. In addition I had again had one test run hanging during stop,
this time for the apr connector not the nio one.

TestCoyoteAdapter: onyl failures with apr, two runs out of 50. Details
below.

I will do some investigations and report back. See below for some detail
on the hang.

Thanks for fixing the other ones and improving this one!

>> The hang for nio in the leak detection test looks suspect (see below),
>> the other failures might be uncritical timing issues:
> 
> Looking at the code, timing issues in the tests look very likely. I've
> committed a fix for the timing issues.
> 
> Regarding the loop, any ideas on the root cause?

I'll investigate but the hang is not a loop. The main thread is waiting
for the main ThreadPoolExecutor lock of the timer pool when trying to
shut it down to prevent the memory leak:

"main" prio=3 tid=0x00029800 nid=0x2 waiting on condition [0xfdf7d000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0xe6db92b8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at
java.util.concurrent.ThreadPoolExecutor.shutdownNow(ThreadPoolExecutor.java:1420)
        at
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads(WebappClassLoader.java:2046)
        at
org.apache.catalina.loader.WebappClassLoader.clearReferences(WebappClassLoader.java:1722)
        at
org.apache.catalina.loader.WebappClassLoader.stop(WebappClassLoader.java:1637)
        at
org.apache.catalina.loader.WebappLoader.stopInternal(WebappLoader.java:491)
        at
org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)

...

and three of the timer threads wait for it as well:

"pool-1-thread-1" prio=3 tid=0x00944800 nid=0x16 waiting on condition
[0xb3a7f000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0xe6db92b8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:998)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

...

No indication what holds the lock or why it wasn't freed after use. The
source code of the ThreadPoolExecutor in JDK 1.7.0_40 doesn't show a
chance of not releasing the lock when leaving the method that grabs it,
it is always locked and then unlocked in a finally clause. But no thread
in the dump sits in a block which should hold the lock. I'll do some
hunting, probably by also instrumenting ThreadPoolExecutor,
ReentrantLock and/or AbstractQueuedSynchronizer.

Concerning TestCoyoteAdapter:

...
21-Sep-2013 19:43:06.801 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
["http-apr-127.0.0.1-auto-13-33900"]
21-Sep-2013 19:43:16.890 SEVERE [http-apr-127.0.0.1-auto-13-Poller]
org.apache.tomcat.util.net.AprEndpoint$Poller.run Poller failed with
error [81] : [File descriptor in bad state]
21-Sep-2013 19:43:26.900 INFO [main]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
["http-apr-127.0.0.1-auto-13-33900"]
...

...
Testcase: testPathParamsRedirect took 21.482 sec
        Caused an ERROR
Unexpected end of file from server
java.net.SocketException: Unexpected end of file from server
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
        at
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
        at
org.apache.catalina.startup.TomcatBaseTest.methodUrl(TomcatBaseTest.java:247)
        at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:219)
        at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:213)
        at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:202)
        at
org.apache.catalina.startup.TomcatBaseTest.getUrl(TomcatBaseTest.java:196)
        at
org.apache.catalina.connector.TestCoyoteAdapter.testPath(TestCoyoteAdapter.java:137)
        at
org.apache.catalina.connector.TestCoyoteAdapter.testPathParamsRedirect(TestCoyoteAdapter.java:116)

Testcase: testPathParmsFooSessionDummyValue took 1.32 sec
...

The SEVERE method is specific for the failure, also the duration above
20 seconds. The duration count (in seconds) of the 150 tests was:

Count Duration
  74 0
  23 1
   3 2
  50 3
   1 20
   1 21

So all test runs below 4 seconds except for the two failing runs. I'll
let that one run longer and will recheck, whether failures only happen
for apr.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to