Hi Daniel,

Thanks for detailed review and comment. Please find my answers inline.

Thanks,

Vyom


On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:
Hi Vyom,

IIUC, the issue only happens when connections (clients?) to the
server are pooled. I assume the server may decide to
close what it thinks is an idle connection at any time.

correct
So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.

I checked  the code and i did not found any problem. Both "get" and "removePooledConnection" are "synchronized" so there should not be any problem(concurrency issue) here.

As you said server is closing the same connection which is retrieved from pool which  is correct but this should not throw unintentional NPE.

In LdapClient, we set the "Ldapclient.conn" to explicitly null asynchronously after we remove the connection from pool and which is creating the NPE.

LdapClient does not set `Ldapclient.conn` to null if connection is not pooled.

The question for me would be "what is the expected behavior
if the server decides to close an idle connection?"
I would expect that new InitialDirContext() should have some
way of dealing with that with retrying?
If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?

 I don't know if retrying is feasible in 'IntialDirContext' but if we leave 'LdapClient.conn' assigned then we will get "javax.naming.CommunicationException" which tells that, underline connection is closed. I am not 100% sure if this is right approach but if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.

I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...

Changing order will not help, i think closing the  same connection is not a problem  but setting `LdapClient.conn'  to null asynchronously is causing NPE.

Please do let me know if i am missing something.

best regards,

-- daniel

On 04/09/2018 15:04, vyom tewari wrote:


On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote:
Hi Vyom,

On 24/08/18 11:35, vyom tewari wrote:
Hi All,

Please review this simple fix below

webrev: http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html

bugid: https://bugs.openjdk.java.net/browse/JDK-8205330

This fix will resolve the race in LdapClient  where we are explicitly making "null" to LdapClient.conn.

Sorry, I don't know this code all that well, but I think
that more explanation will be needed before this code
can be reviewed.

Hi Chris, let me try to explain issue little bit.

The issue is a if ldap connection has already been established and then the LDAP directory server sends an (unsolicited) "Notice of Disconnection", the client's processing of this LDAP message can race with an application thread calling new InitialDirContext() to authenticate user credentials (i.e.bind) and throw NPE.

I did some analysis and found out that when server send an (unsolicited) "Notice of Disconnection", LdapClient.forceClose() will be called in LdapClient.processUnsolicited() which is called asynchronously by the reader thread in Connection, this means 'LdapClient.conn' may set to null anytime after it received  "Notice of Disconnection".

The LdapClient and the Connection seem to be loosely
coupled. I think part of this is to allow the LdapClient
to be GC'ed and finalized separately to the Connection
( that can be reused ). Not setting `conn` to null could
have a negative impact on this loose coupling. I also
note that the synchronization is implemented poorly in
the LdapClient, `conn` is operated on both from within
synchronized blocks and from unsynchronized blocks (
which I think is the reason you see the unexpected
null ).

I agree, not setting  'conn' to null will definitely have some impact. I check the LdapClient and it looks to me it is intentional(i may be wrong) that 'conn' is operated on both from within synchronize blocks and from unsynchronize block.

LdapClient, java doc says that connection(conn) take care of it's own sync.

##################################
    access from outside LdapClient must sync;
  *   conn - no sync; Connection takes care of its own sync
  *   unsolicited - sync Vector; multiple operations sync'ed

##################################

Please have a look and do let me know your thought on the above.

Thanks,
Vyom
-Chris.



Reply via email to