[ http://issues.apache.org/jira/browse/HADOOP-255?page=all ]

Owen O'Malley updated HADOOP-255:
---------------------------------

    Attachment: rpc-timeout.patch

This patch has the rpc server handlers discard any call that is older than 60% 
of the ipc.timeout.


> Client Calls are not cancelled after a call timeout
> ---------------------------------------------------
>
>                 Key: HADOOP-255
>                 URL: http://issues.apache.org/jira/browse/HADOOP-255
>             Project: Hadoop
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.2.1
>         Environment: Tested on Linux 2.6
>            Reporter: Naveen Nalam
>         Assigned To: Owen O'Malley
>         Attachments: rpc-timeout.patch
>
>
> In ipc/Client.java, if a call times out, a SocketTimeoutException is thrown 
> but the Call object still exists on the queue.
> What I found was that when transferring very large amounts of data, it's 
> common for queued up calls to timeout. Yet even though the caller has is no 
> longer waiting, the request is still serviced on the server and the data is 
> sent to the client. The client after receiving the full response calls 
> callComplete() which is a noop since nobody is waiting.
> The problem is that the calls that timeout will retry and the system gets 
> into a situation where data is being transferred around, but it's all data 
> for timed out requests and no progress is ever made.
> My quick solution to this was to add a "boolean timedout" to the Call object 
> which I set to true whenever the queued caller times out. And then when the 
> client starts to pull over the response data (in Connection::run) to first 
> check if the Call is timedout and immediately close the connection.
> I think a good fix for this is to queue requests on the client, and do a 
> single sendParam only when there is no outstanding request. This will allow 
> closing the connection when receiving a response for a request we no longer 
> have pending, reopen the connection, and resend the next queued request. I 
> can provide a patch for this, but I've seen a lot of recent activity in this 
> area so I'd like to get some feedback first.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to