[jira] [Comment Edited] (HDFS-14146) Handle exception from internalQueueCall

Erik Krogen (JIRA) Wed, 12 Dec 2018 16:29:09 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719587#comment-16719587
 ]


Erik Krogen edited comment on HDFS-14146 at 12/13/18 12:28 AM:
---------------------------------------------------------------

{quote}
I couldn't come up with an example (It would be great if you have one) that 
causing all the handlers to stuck yet, but certainly can see that some handlers 
get blocked when putting the call back, which is undesirable.
{quote}
Let's imagine that the call queue currently is completely full (with many 
requests which have a state ID in the future) and the listen queue is backing 
up. All of your handler threads start to grab items from the queue; for each 
item taken from the queue, the readers add new items, since the listen queue 
has backed up. By the time any of the handler threads try to call 
{{callQueue.put()}}, the call queue is already completely full because the 
readers have been inserting new items. I think the scenario is pretty unlikely 
since there are many more handler threads than readers, and the readers would 
have to consistently win races against the handlers to add items, but still, a 
NameNode deadlock is no joke :)


was (Author: xkrogen):
{quote}
I couldn't come up with an example (It would be great if you have one) that 
causing all the handlers to stuck yet, but certainly can see that some handlers 
get blocked when putting the call back, which is undesirable.
{quote}
Let's imagine that the call queue currently is completely full (with many 
requests which have a state ID in the future) and the listen queue is backing 
up. All of your handler threads start to grab items from the queue; for each 
item taken from the queue, the readers add new items, since the listen queue 
has backed up. By the time any of the handler threads try to call 
{{callQueue.put()}}, the call queue is already completely full because the 
readers have been inserting new items. I think the scenario is pretty unlikely 
since there are many more handler threads than readers, and the readers would 
have to consistently win races against the handlers to add items, but still, a 
NameNode deadlock is no joke :)

{quote}
Yes I compared this code with Call#doResponse - they are doing the same thing.
{quote}
Did you see the edit to my last comment? I believe that the {{RpcStatusProto}} 
returned may be different which can have implications on client behavior.

> Handle exception from internalQueueCall
> ---------------------------------------
>
>                 Key: HDFS-14146
>                 URL: https://issues.apache.org/jira/browse/HDFS-14146
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ipc
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Critical
>         Attachments: HDFS-14146-HDFS-12943.000.patch
>
>
> When we re-queue RPC call, the {{internalQueueCall}} will potentially throw 
> exceptions (e.g., RPC backoff), which is then swallowed. This will cause the 
> RPC to be silently discarded without response to the client, which is not 
> good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14146) Handle exception from internalQueueCall

Reply via email to