[ 
https://issues.apache.org/jira/browse/HADOOP-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476650
 ] 

Devaraj Das commented on HADOOP-1049:
-------------------------------------

I just looked at the code to find out possible race conditions. I saw that one 
possible case is when there is an error in connecting to a server. In such a 
case, the values of the various fields are:
socket = some valid value, inUse = 0, shouldCloseConnection = false, in = null
At this point of time, the connection-thread is waiting on a wait() method 
(inside waitForWork)
Now, assuming that the ConnectionCuller has not killed the connection (removed 
the connection from the cache), if another attempt is made to connect to the 
same server, the ref count is incremented on the connection object. The call to 
setupIOstreams will notify the connection-thread that there is work to be done 
and return immediately (as the socket is non-null). The connection-thread wakes 
up and finds the values:
socket = some valid value, inUse = 1, shouldCloseConnection = false, in = null
So waitForWork returns "true". This causes the next statement in the 
connection-thread's run method to execute which is "in.readInt" and since "in" 
is null we get a NullPointerException.

When the patch to HADOOP-312 was committed, the socket.connect call was not 
there and instead the socket would always be null if the connection could not 
be established to the server in question. In some patch, this behaviour got 
changed (included timeout) to 
socket = new Socket();
socket.connect(address, timeout); 
So, irrespective of whether we could connect to the server, socket would always 
have a valid non-null value. Unfortunately, this impacts the logic of the IPC 
client system. 

A fix for this would be to set socket to null if we could not connect to the 
server after maxRetries number of retrials (today just inUse is set to zero if 
this condition becomes true).

> race condition in setting up ipc connections
> --------------------------------------------
>
>                 Key: HADOOP-1049
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1049
>             Project: Hadoop
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.11.2
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.12.0
>
>
> While running svn head, I get:
> [junit] 2007-02-27 19:11:17,707 INFO  ipc.Client (Client.java:run(281)) - 
> java.lang.NullPointerException
>     [junit]   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:251)
> There is a race condition between when the threads are created above and when 
> the IO streams are set up below.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to