Hi Daniel,
Thanks for detailed review and comment. Please find my answers inline.
Thanks,
Vyom
On Tuesday 04 September 2018 08:13 PM, Daniel Fuchs wrote:
Hi Vyom,
IIUC, the issue only happens when connections (clients?) to the
server are pooled. I assume the server may decide to
close what it thinks is an idle connection at any time.
correct
So what we see here is actually a race between an open
connection being retrieved from the pool, and the server
deciding to close that connection.
The connection/client is retrieved from the pool, then
unfortunately closed by the server after having been selected
for the next operation.
I checked the code and i did not found any problem. Both "get" and
"removePooledConnection" are "synchronized" so there should not be any
problem(concurrency issue) here.
As you said server is closing the same connection which is retrieved
from pool which is correct but this should not throw unintentional NPE.
In LdapClient, we set the "Ldapclient.conn" to explicitly null
asynchronously after we remove the connection from pool and which is
creating the NPE.
LdapClient does not set `Ldapclient.conn` to null if connection is not
pooled.
The question for me would be "what is the expected behavior
if the server decides to close an idle connection?"
I would expect that new InitialDirContext() should have some
way of dealing with that with retrying?
If so will leaving the 'conn' field assigned ensure that
the retry happens, or will new InitialDirContext() fail
with some other exception because the connection has
been closed? Or does the code simply assume that asynchronous
closing of the connection by the server can only happen while
it's sitting idle in the pool?
I don't know if retrying is feasible in 'IntialDirContext' but if we
leave 'LdapClient.conn' assigned then we will get
"javax.naming.CommunicationException" which tells that, underline
connection is closed. I am not 100% sure if this is right approach but
if we set 'LdapClient.conn' to null asynchronously then we will hit NPE.
I wonder if some other mechanism could be used to reduce
and hopefully eliminate the time window in which the race
could appear. For instance taking the connection out of
the pool before closing it, instead of closing it before
taking it out of the pool, etc...
Changing order will not help, i think closing the same connection is
not a problem but setting `LdapClient.conn' to null asynchronously is
causing NPE.
Please do let me know if i am missing something.
best regards,
-- daniel
On 04/09/2018 15:04, vyom tewari wrote:
On Friday 24 August 2018 08:52 PM, Chris Hegarty wrote:
Hi Vyom,
On 24/08/18 11:35, vyom tewari wrote:
Hi All,
Please review this simple fix below
webrev:
http://cr.openjdk.java.net/~vtewari/8205330/webrev0.0/index.html
bugid: https://bugs.openjdk.java.net/browse/JDK-8205330
This fix will resolve the race in LdapClient where we are
explicitly making "null" to LdapClient.conn.
Sorry, I don't know this code all that well, but I think
that more explanation will be needed before this code
can be reviewed.
Hi Chris, let me try to explain issue little bit.
The issue is a if ldap connection has already been established and
then the LDAP directory server sends an (unsolicited) "Notice of
Disconnection", the client's processing of this LDAP message can race
with an application thread calling new InitialDirContext() to
authenticate user credentials (i.e.bind) and throw NPE.
I did some analysis and found out that when server send an
(unsolicited) "Notice of Disconnection", LdapClient.forceClose() will
be called in LdapClient.processUnsolicited() which is called
asynchronously by the reader thread in Connection, this means
'LdapClient.conn' may set to null anytime after it received "Notice
of Disconnection".
The LdapClient and the Connection seem to be loosely
coupled. I think part of this is to allow the LdapClient
to be GC'ed and finalized separately to the Connection
( that can be reused ). Not setting `conn` to null could
have a negative impact on this loose coupling. I also
note that the synchronization is implemented poorly in
the LdapClient, `conn` is operated on both from within
synchronized blocks and from unsynchronized blocks (
which I think is the reason you see the unexpected
null ).
I agree, not setting 'conn' to null will definitely have some
impact. I check the LdapClient and it looks to me it is intentional(i
may be wrong) that 'conn' is operated on both from within synchronize
blocks and from unsynchronize block.
LdapClient, java doc says that connection(conn) take care of it's own
sync.
##################################
access from outside LdapClient must sync;
* conn - no sync; Connection takes care of its own sync
* unsolicited - sync Vector; multiple operations sync'ed
##################################
Please have a look and do let me know your thought on the above.
Thanks,
Vyom
-Chris.