[jira] [Commented] (HDFS-14146) Handle exception from internalQueueCall

Erik Krogen (JIRA) Wed, 12 Dec 2018 15:49:30 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719557#comment-16719557
 ]


Erik Krogen commented on HDFS-14146:
------------------------------------

Hey [~csun], this is a good find. I actually think this problem is more serious 
than just the issue you raised.

Normally, _reader_ threads attempt to offer a request to the {{callQueue}}. If 
the {{callQueue}} is full, they will block, creating a natural backoff by 
pushing back on the listen queue, which will eventually cause clients to be 
unable to connect. However, now we have the _handler_ threads attempting to 
offer a request to the {{callQueue}} -- but the handler threads are the same 
ones that drain the queue. If the queue became full, and all handler threads 
were waiting to attempt to push a request into the {{callQueue}}, this could 
result in deadlock (all handlers are waiting on the queue because it is full, 
and no handler will drain the queue).

I think that instead of {{callQueue.put()}} we need to use {{callQueue.add()}}, 
which will never block, and instead throw an overflow exception if the queue is 
full. We also should make sure the exception handling is the same as when it 
happens in the reader thread, which current looks like (taken from 
{{processOneRpc}}):
{code}
        ...
        } else {
          processRpcRequest(header, buffer);
        }
      } catch (RpcServerException rse) {
        // inform client of error, but do not rethrow else non-fatal
        // exceptions will close connection!
        if (LOG.isDebugEnabled()) {
          LOG.debug(Thread.currentThread().getName() +
              ": processOneRpc from client " + this +
              " threw exception [" + rse + "]");
        }
        // use the wrapped exception if there is one.
        Throwable t = (rse.getCause() != null) ? rse.getCause() : rse;
        final RpcCall call = new RpcCall(this, callId, retry);
        setupResponse(call,
            rse.getRpcStatusProto(), rse.getRpcErrorCodeProto(), null,
            t.getClass().getName(), t.getMessage());
        sendResponse(call);
      }
{code}
It's not clear to me if the logic you added will match this; can you confirm?

> Handle exception from internalQueueCall
> ---------------------------------------
>
>                 Key: HDFS-14146
>                 URL: https://issues.apache.org/jira/browse/HDFS-14146
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ipc
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Critical
>         Attachments: HDFS-14146-HDFS-12943.000.patch
>
>
> When we re-queue RPC call, the {{internalQueueCall}} will potentially throw 
> exceptions (e.g., RPC backoff), which is then swallowed. This will cause the 
> RPC to be silently discarded without response to the client, which is not 
> good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14146) Handle exception from internalQueueCall

Reply via email to