Race condition in ipc.Server prevents responce being written back to client.
----------------------------------------------------------------------------

                 Key: HADOOP-2789
                 URL: https://issues.apache.org/jira/browse/HADOOP-2789
             Project: Hadoop Core
          Issue Type: Bug
          Components: ipc
    Affects Versions: 0.16.0
            Reporter: Clint Morgan
            Priority: Critical


I encountered a race condition in ipc.Server when writing the response
back to the socket. Sometimes the write SelectKey is being canceled
when it should not be, and thus the full response never gets
written. This results in clients timing out on the socket while waiting for the 
response.

I am attaching a unit test that demonstrates the problem. It follows
closely the TestIPC test, however the socket output buffer is set
smaller than the result being sent back, so that partial writes
occur. I also put random sleep in the client to help provoke the race
condition.

On my machine this fails over half of the time.

Looking at the code in ipc.Server.java. The problem is manifested in
Responder.doAsyncWrite(). If I comment out the key.cancel() line, then
everything works fine. 

So we need to identify when to safely cancel the key.

I tried the following:

{noformat}
    private void doAsyncWrite(SelectionKey key) throws IOException {
      Call call = (Call)key.attachment();
      if (call == null) {
        return;
      }
      if (key.channel() != call.connection.channel) {
        throw new IOException("doAsyncWrite: bad channel");
      }
      if (processResponse(call.connection.responseQueue)) {
          synchronized(call.connection.responseQueue) {
              if (call.connection.responseQueue.size() == 0) {
                  LOG.info("Cancelling key for call "+call.toString()+ " key: 
"+ key.toString());
                  key.cancel();          // remove item from selector.
              } else {
                  LOG.warn("NOT REALLY DONE: "+call.toString()+ " key: "+ 
key.toString());
              }
          }
      }
    }
{noformat}

And this does catch some of the cases (EG, the LOG.warn message gets hit), but 
i still hit the race condition.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to