[ 
https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072928#comment-13072928
 ] 

Matt Foley commented on HADOOP-6889:
------------------------------------

Unfortunately this patch diverges a lot from the trunk patch (presumably 
because of 0.20/0.23 code tree divergence, of course), so I could not usefully 
diff the patches and had to review this like a new patch.  

In terms of code review, I found no problems.  But it's a large enough patch 
that we are dependent on thorough unit testing to be confident in the patch.  
So I have two questions:

1. I see a single new test case, TestIPC.testIpcTimeout(), that tests the 
lowest-level timeout functionality, between a client and a TestServer server.  
However, I do not see any test cases that check whether the integration of that 
timeout functionality with, eg, the InterDatanodeProtocol works as expected. 
(The mod to TestInterDatanodeProtocol merely adapts to the change, it does not 
test the change.)  Similarly, no test of timeout in the context of DFSClient 
with a MiniDFSCluster.  Granted the original patch to trunk doesn't test these 
either.  But do you feel confident in the patch without such additional tests, 
and why?

2. Are the variances between the trunk and v20 patches due only to code tree 
divergence, or are there changes added to the v20 patch that are not in v23 and 
perhaps should be?  Thanks.


> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: John George
>             Fix For: 0.20-append, 0.20.205.0, 0.22.0
>
>         Attachments: HADOOP-6889-for20.patch, HADOOP-6889.patch, 
> ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it 
> currently does is that a RPC client sends a ping to the server whenever a 
> socket timeout happens. If the server is still alive, it continues to wait 
> instead of throwing a SocketTimeoutException. This is to avoid a client to 
> retry when a server is busy and thus making the server even busier. This 
> works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, 
> for example, for getting a replica's length. When a client comes across a 
> problematic DataNode, it gets stuck and can not switch to a different 
> DataNode. In this case, it would be better that the client receives a timeout 
> exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max 
> number of pings that a client could try. If a response can not be received 
> after the specified max number of pings, a SocketTimeoutException is thrown. 
> If this configuration property is not set, a client maintains the current 
> semantics, waiting forever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to