[
http://issues.apache.org/jira/browse/DIRSERVER-586?page=comments#action_12425489
]
Jörg Henne commented on DIRSERVER-586:
--------------------------------------
Thanks for your continued feedback, Emmanuel!
I'll answer your points one-by-one:
1) Even other threads within my test case can continue their work undisturbed.
Connections from other sources are also not problem at all. The symptom is
simply that some connections seem to just go dead.
To give you an idea of how many are affected: I usually run the tes with 10
threads, each executing about 200 interactions with the server (100 object
creations, 100 deletions). Of those 10 threads usually about 1-3 run into the
hang.
As stated earlier: when a connection runs into the hung state, this causes the
corresponding channel to not be returned from Selector.select() calls. My
earlier observation, that the channel is completely lost from the selector's
channel list was bunk, btw. It is still there, but simply not selected. This
may very well be a problem with the runtime libraries or even the LDAP client,
BTW.
2) Good idea, but still: hangs as before.
3)
JRockit: hey, I've wanted to try this for a long time. Now it's time to do so.
Test 1: Server on JRockit, client (unit test) on SUN: still hangs.
Test 2: Server on JRockit, client on JRockit: no hang. What was that? Several
tries: IT! DOESN'T! HANG! wow.
Test 3: Server in SUN, client on JRockit: still not hang.
Interesting.
IBM JVM:
Test 1: server on SUN client on IBM: hang!
Test 2: server on IBM, client on IBM: hang!
Observation on the side: the test runs 3-4 times slower on IBM and SUN JVMs
(even though some Threads don't even make it to the end due to a hang!)
compared to JRockit. The effect on the server side seems to be far less
pronounced, which might me due to the log output in the client side.
While we're at it, some completely unscientific benchmarks. The client is
always on JRockit (since it is the only way the client always makes it to the
end, it doesn't make sense to compare using other JVMs for the client) and run
multiple times to allow for some JIT burn-in:
- SUN 1.5.0_07: ~5000ms per test run.
- IBM 1.5: ~5700ms
- JRockit: ~4000ms (OMG!)
4) The thread dump is not a problem. I have both client and server running
under full debugger control and can plainly see what all the threads are doing.
A TCP capture would be very, very interesting, but I don't know how I can
capture traffic which doesn't actually cross a physical network interface.
5) Unfortunately, I don't have any chickens at hand (lucky them!), but to draw
some conclusion: one possible explanation would be that the problems are caused
by the different IO libraries used by the different JVMs (see thread dumps
below). The cause could also be a problem on the server side which is triggered
by certain timing differences between the client JVMs. However, I think the
former seems more likely to me, because the hangs don't seem to be influenced
by the client timing itself. In fact, I first got the hangs using my OLM
(object to LDAP mapping) framework which surely has very, very different timing
characteristics.
Here's a stack dump of a JRockit reader thread:
Thread [Thread-4] (Suspended)
owns: java.io.BufferedInputStream (id=51)
jrockit.net.SocketNativeIO.readBytesPinned(int, byte[], int, int, int)
line: not available [native method]
jrockit.net.SocketNativeIO.socketRead(java.io.FileDescriptor, byte[],
int, int, int) line: not available
java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[],
int, int, int) line: not available
java.net.SocketInputStream.read(byte[], int, int) line: 129
java.io.BufferedInputStream.fill() line: 218
java.io.BufferedInputStream.read1(byte[], int, int) line: 256
java.io.BufferedInputStream.read(byte[], int, int) line: 313
com.sun.jndi.ldap.Connection.run() line: 784
java.lang.Thread.run() line: not available
This is from the IBM JVM:
Thread [Thread-17] (Suspended)
owns: java.io.BufferedInputStream (id=45)
java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[],
int, int, int) line: not available [native method]
java.net.SocketInputStream.read(byte[], int, int) line: 155
java.io.BufferedInputStream.fill() line: 229
java.io.BufferedInputStream.read1(byte[], int, int) line: 267
java.io.BufferedInputStream.read(byte[], int, int) line: 324
com.sun.jndi.ldap.Connection.run() line: 814
java.lang.Thread.run() line: 788
And this is, finally, the SUN JVM:
Thread [Thread-31] (Suspended)
owns: java.io.BufferedInputStream (id=60)
java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[],
int, int, int) line: not available [native method]
java.net.SocketInputStream.read(byte[], int, int) line: 129
java.io.BufferedInputStream.fill() line: 218
java.io.BufferedInputStream.read1(byte[], int, int) line: 256
java.io.BufferedInputStream.read(byte[], int, int) line: 313
com.sun.jndi.ldap.Connection.run() line: 784
java.lang.Thread.run() line: 595
I'm not saying that this specific class is the culprit - it is rather the
write-side of the communication which is the problem, but the stack dumps
indicate that JRockit has very different socket-IO code compared to SUN/IBM.
Wild guess: IBM licensed
> Reliable hang of DS during query
> --------------------------------
>
> Key: DIRSERVER-586
> URL: http://issues.apache.org/jira/browse/DIRSERVER-586
> Project: Directory ApacheDS
> Issue Type: Bug
> Environment: DS 0.9.3, Windows, JDK 1.5
> Reporter: Jörg Henne
> Assigned To: Alex Karasulu
> Attachments: bugreport.zip, TestHang.java
>
>
> When running the attached test, the directory server hangs after executing a
> slew of operations when searching for objects.
> First of all, some background on the test case:
> The attached test case (in the form of an exported eclipse project) is,
> unfortunately, based on quite a few classes. They are part of a project I am
> currently working on: an object to ldap mapper with a similar approach as
> castor for XML or hibernate for RDBMS, albeit a lot more modest in complexity
> (I'll, hopefully, one day be able to open-source it - for now it is still
> much to immature). I have supplied all that stuff mainly for your reference.
> To run the test case, please make sure that the constant "URL" in
> LDAPDirectoryTest points to a valid directory. The URL the context points to
> must exist. It will, however, subsequently create lots of nodes below it.
> The hang seems to be related to some kind of deadlock, since it doesn't occur
> once the whole test is run via a single context only. To achieve this, set
> the constant "ONE_CONTEXT" to true (each LDAPDirectory uses its own set of
> contexts).
> If you have any problems running the test, please don't hesitate to contact
> me.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira