[ http://issues.apache.org/jira/browse/HADOOP-255?page=comments#action_12440279 ] Owen O'Malley commented on HADOOP-255: --------------------------------------
I'm going to hijack this bug. Clearly the original context was fixed by moving from the rpc getMapOutput to a jetty servlet. However, we are seeing cases where the dfs servers have trouble keeping up with the rpc calls. Therefore, I propose that we define a fraction of the ipc.timeout that is the maximum time the rpc calls can take before they are given to the handler. > Client Calls are not cancelled after a call timeout > --------------------------------------------------- > > Key: HADOOP-255 > URL: http://issues.apache.org/jira/browse/HADOOP-255 > Project: Hadoop > Issue Type: Bug > Components: ipc > Affects Versions: 0.2.1 > Environment: Tested on Linux 2.6 > Reporter: Naveen Nalam > Assigned To: Owen O'Malley > > In ipc/Client.java, if a call times out, a SocketTimeoutException is thrown > but the Call object still exists on the queue. > What I found was that when transferring very large amounts of data, it's > common for queued up calls to timeout. Yet even though the caller has is no > longer waiting, the request is still serviced on the server and the data is > sent to the client. The client after receiving the full response calls > callComplete() which is a noop since nobody is waiting. > The problem is that the calls that timeout will retry and the system gets > into a situation where data is being transferred around, but it's all data > for timed out requests and no progress is ever made. > My quick solution to this was to add a "boolean timedout" to the Call object > which I set to true whenever the queued caller times out. And then when the > client starts to pull over the response data (in Connection::run) to first > check if the Call is timedout and immediately close the connection. > I think a good fix for this is to queue requests on the client, and do a > single sendParam only when there is no outstanding request. This will allow > closing the connection when receiving a response for a request we no longer > have pending, reopen the connection, and resend the next queued request. I > can provide a patch for this, but I've seen a lot of recent activity in this > area so I'd like to get some feedback first. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
