[ http://issues.apache.org/jira/browse/HADOOP-255?page=all ]
Owen O'Malley updated HADOOP-255:
---------------------------------
Attachment: rpc-timeout.patch
This patch has the rpc server handlers discard any call that is older than 60%
of the ipc.timeout.
> Client Calls are not cancelled after a call timeout
> ---------------------------------------------------
>
> Key: HADOOP-255
> URL: http://issues.apache.org/jira/browse/HADOOP-255
> Project: Hadoop
> Issue Type: Bug
> Components: ipc
> Affects Versions: 0.2.1
> Environment: Tested on Linux 2.6
> Reporter: Naveen Nalam
> Assigned To: Owen O'Malley
> Attachments: rpc-timeout.patch
>
>
> In ipc/Client.java, if a call times out, a SocketTimeoutException is thrown
> but the Call object still exists on the queue.
> What I found was that when transferring very large amounts of data, it's
> common for queued up calls to timeout. Yet even though the caller has is no
> longer waiting, the request is still serviced on the server and the data is
> sent to the client. The client after receiving the full response calls
> callComplete() which is a noop since nobody is waiting.
> The problem is that the calls that timeout will retry and the system gets
> into a situation where data is being transferred around, but it's all data
> for timed out requests and no progress is ever made.
> My quick solution to this was to add a "boolean timedout" to the Call object
> which I set to true whenever the queued caller times out. And then when the
> client starts to pull over the response data (in Connection::run) to first
> check if the Call is timedout and immediately close the connection.
> I think a good fix for this is to queue requests on the client, and do a
> single sendParam only when there is no outstanding request. This will allow
> closing the connection when receiving a response for a request we no longer
> have pending, reopen the connection, and resend the next queued request. I
> can provide a patch for this, but I've seen a lot of recent activity in this
> area so I'd like to get some feedback first.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira