[ 
http://issues.apache.org/jira/browse/DIRSERVER-586?page=comments#action_12425489
 ] 
            
Jörg Henne commented on DIRSERVER-586:
--------------------------------------

Thanks for your continued feedback, Emmanuel!

I'll answer your points one-by-one:

1) Even other threads within my test case can continue their work undisturbed. 
Connections from other sources are also not problem at all. The symptom is 
simply that some connections seem to just go dead. 
To give you an idea of how many are affected: I usually run the tes with 10 
threads, each executing about 200 interactions with the server (100 object 
creations, 100 deletions). Of those 10 threads usually about 1-3 run into the 
hang.
As stated earlier: when a connection runs into the hung state, this causes the 
corresponding channel to not be returned from Selector.select() calls. My 
earlier observation, that the channel is completely lost from the selector's 
channel list was bunk, btw. It is still there, but simply not selected. This 
may very well be a problem with the runtime libraries or even the LDAP client, 
BTW.

2) Good idea, but still: hangs as before.

3) 
JRockit: hey, I've wanted to try this for a long time. Now it's time to do so.
Test 1: Server on JRockit, client (unit test) on SUN: still hangs.
Test 2: Server on JRockit, client on JRockit: no hang. What was that? Several 
tries: IT! DOESN'T! HANG! wow.
Test 3: Server in SUN, client on JRockit: still not hang.
Interesting.

IBM JVM:
Test 1: server on SUN client on IBM: hang!
Test 2: server on IBM, client on IBM: hang!

Observation on the side: the test runs 3-4 times slower on IBM and SUN JVMs 
(even though some Threads don't even make it to the end due to a hang!) 
compared to JRockit. The effect on the server side seems to be far less 
pronounced, which might me due to the log output in the client side.

While we're at it, some completely unscientific benchmarks. The client is 
always on JRockit (since it is the only way the client always makes it to the 
end, it doesn't make sense to compare using other JVMs for the client) and run 
multiple times to allow for some JIT burn-in:
- SUN 1.5.0_07: ~5000ms per test run.
- IBM 1.5: ~5700ms
- JRockit: ~4000ms (OMG!)

4) The thread dump is not a problem. I have both client and server running 
under full debugger control and can plainly see what all the threads are doing. 
A TCP capture would be very, very interesting, but I don't know how I can 
capture traffic which doesn't actually cross a physical network interface.

5) Unfortunately, I don't have any chickens at hand (lucky them!), but to draw 
some conclusion: one possible explanation would be that the problems are caused 
by the different IO libraries used by the different JVMs (see thread dumps 
below). The cause could also be a problem on the server side which is triggered 
by certain timing differences between the client JVMs. However, I think the 
former seems more likely to me, because the hangs don't seem to be influenced 
by the client timing itself. In fact, I first got the hangs using my OLM 
(object to LDAP mapping) framework which surely has very, very different timing 
characteristics.

Here's a stack dump of a JRockit reader thread:
Thread [Thread-4] (Suspended)
        owns: java.io.BufferedInputStream  (id=51)
        jrockit.net.SocketNativeIO.readBytesPinned(int, byte[], int, int, int) 
line: not available [native method]
        jrockit.net.SocketNativeIO.socketRead(java.io.FileDescriptor, byte[], 
int, int, int) line: not available
        java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], 
int, int, int) line: not available
        java.net.SocketInputStream.read(byte[], int, int) line: 129
        java.io.BufferedInputStream.fill() line: 218
        java.io.BufferedInputStream.read1(byte[], int, int) line: 256
        java.io.BufferedInputStream.read(byte[], int, int) line: 313
        com.sun.jndi.ldap.Connection.run() line: 784
        java.lang.Thread.run() line: not available

This is from the IBM JVM:
Thread [Thread-17] (Suspended)
        owns: java.io.BufferedInputStream  (id=45)
        java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], 
int, int, int) line: not available [native method]
        java.net.SocketInputStream.read(byte[], int, int) line: 155
        java.io.BufferedInputStream.fill() line: 229
        java.io.BufferedInputStream.read1(byte[], int, int) line: 267
        java.io.BufferedInputStream.read(byte[], int, int) line: 324
        com.sun.jndi.ldap.Connection.run() line: 814
        java.lang.Thread.run() line: 788

And this is, finally, the SUN JVM:
Thread [Thread-31] (Suspended)
        owns: java.io.BufferedInputStream  (id=60)
        java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], 
int, int, int) line: not available [native method]
        java.net.SocketInputStream.read(byte[], int, int) line: 129
        java.io.BufferedInputStream.fill() line: 218
        java.io.BufferedInputStream.read1(byte[], int, int) line: 256
        java.io.BufferedInputStream.read(byte[], int, int) line: 313
        com.sun.jndi.ldap.Connection.run() line: 784
        java.lang.Thread.run() line: 595


I'm not saying that this specific class is the culprit - it is rather the 
write-side of the communication which is the problem, but the stack dumps 
indicate that JRockit has very different socket-IO code compared to SUN/IBM. 
Wild guess: IBM licensed 

> Reliable hang of DS during query
> --------------------------------
>
>                 Key: DIRSERVER-586
>                 URL: http://issues.apache.org/jira/browse/DIRSERVER-586
>             Project: Directory ApacheDS
>          Issue Type: Bug
>         Environment: DS 0.9.3, Windows, JDK 1.5
>            Reporter: Jörg Henne
>         Assigned To: Alex Karasulu
>         Attachments: bugreport.zip, TestHang.java
>
>
> When running the attached test, the directory server hangs after executing a 
> slew of operations when searching for objects.
> First of all, some background on the test case:
> The attached test case (in the form of an exported eclipse project) is, 
> unfortunately, based on quite a few classes. They are part of a project I am 
> currently working on: an object to ldap mapper with a similar approach as 
> castor for XML or hibernate for RDBMS, albeit a lot more modest in complexity 
> (I'll, hopefully, one day be able to open-source it - for now it is still 
> much to immature). I have supplied all that stuff mainly for your reference.
> To run the test case, please make sure that the constant "URL" in 
> LDAPDirectoryTest points to a valid directory. The URL the context points to 
> must exist. It will, however, subsequently create lots of nodes below it.
> The hang seems to be related to some kind of deadlock, since it doesn't occur 
> once the whole test is run via a single context only. To achieve this, set 
> the constant "ONE_CONTEXT" to true (each LDAPDirectory uses its own set of 
> contexts).
> If you have any problems running the test, please don't hesitate to contact 
> me.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to