[
https://issues.apache.org/jira/browse/HADOOP-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476650
]
Devaraj Das commented on HADOOP-1049:
-------------------------------------
I just looked at the code to find out possible race conditions. I saw that one
possible case is when there is an error in connecting to a server. In such a
case, the values of the various fields are:
socket = some valid value, inUse = 0, shouldCloseConnection = false, in = null
At this point of time, the connection-thread is waiting on a wait() method
(inside waitForWork)
Now, assuming that the ConnectionCuller has not killed the connection (removed
the connection from the cache), if another attempt is made to connect to the
same server, the ref count is incremented on the connection object. The call to
setupIOstreams will notify the connection-thread that there is work to be done
and return immediately (as the socket is non-null). The connection-thread wakes
up and finds the values:
socket = some valid value, inUse = 1, shouldCloseConnection = false, in = null
So waitForWork returns "true". This causes the next statement in the
connection-thread's run method to execute which is "in.readInt" and since "in"
is null we get a NullPointerException.
When the patch to HADOOP-312 was committed, the socket.connect call was not
there and instead the socket would always be null if the connection could not
be established to the server in question. In some patch, this behaviour got
changed (included timeout) to
socket = new Socket();
socket.connect(address, timeout);
So, irrespective of whether we could connect to the server, socket would always
have a valid non-null value. Unfortunately, this impacts the logic of the IPC
client system.
A fix for this would be to set socket to null if we could not connect to the
server after maxRetries number of retrials (today just inUse is set to zero if
this condition becomes true).
> race condition in setting up ipc connections
> --------------------------------------------
>
> Key: HADOOP-1049
> URL: https://issues.apache.org/jira/browse/HADOOP-1049
> Project: Hadoop
> Issue Type: Bug
> Components: ipc
> Affects Versions: 0.11.2
> Reporter: Owen O'Malley
> Assigned To: Owen O'Malley
> Fix For: 0.12.0
>
>
> While running svn head, I get:
> [junit] 2007-02-27 19:11:17,707 INFO ipc.Client (Client.java:run(281)) -
> java.lang.NullPointerException
> [junit] at org.apache.hadoop.ipc.Client$Connection.run(Client.java:251)
> There is a race condition between when the threads are created above and when
> the IO streams are set up below.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.