[
https://issues.apache.org/jira/browse/HADOOP-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388747#comment-15388747
]
Peter Shi commented on HADOOP-13404:
------------------------------------
I think there are 2 solution
1) add ping response in RPC server, and check the response in client side. Need
client side and server side modification, which may have some compatibility
issue.
2) add thread to scan the calls inside the connection, send timeout exception
to the response if the call do not get response for a long time. This is only
client side solution.
> RPC call hangs when server side CPU overloaded
> ----------------------------------------------
>
> Key: HADOOP-13404
> URL: https://issues.apache.org/jira/browse/HADOOP-13404
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Peter Shi
>
> In our reliability test, in namenode, inject fault like cpu 100% consumed,
> after fault injection, for existing connection, all the request will hangs
> forever, not timeout. for new coming connection, it will failover to another
> namenode in HA deployment.
> There is no timeout mechanism for calls on established connection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]