[
https://issues.apache.org/jira/browse/HADOOP-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580492#action_12580492
]
Hairong Kuang commented on HADOOP-2188:
---------------------------------------
> As you mentioned there is more synchronization involved. It is harder check
> the correctness with 'isClosed' etc. For e.g. it took some time to see what
> happens sendParam() returns silently when isClosed is true.
Since this patch removes SocketTimeoutException, it exposes quite a lot
incorrect synchronizations in the code. Previously applications receive a
SocketTimeoutException when a call is lost but now applications get stuck for
ever. It took me quite a lot of energy to debug and sort out the
synchronization part. Thank you for taking time to check its correctness.
> what should server do if some clients just don't read from the sockets? I
> think purging exists only to handle exceptional cases like (unintentionally)
> rogue clients. One actual case that happened is that one user accindentally
> started thousands of clients from one machine and these clients could not
> read.
I think we should assume that clients uses IPC Client to talk to the IPC
server, so no worry about their not reading from the sockets. In the case of
1000 clients per machine, if they all can send requests, why could not they
read?
> RPC should send a ping rather than use client timeouts
> ------------------------------------------------------
>
> Key: HADOOP-2188
> URL: https://issues.apache.org/jira/browse/HADOOP-2188
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs, ipc
> Reporter: Owen O'Malley
> Assignee: Hairong Kuang
> Attachments: ipc-timeout.patch, ipc-timeout1.patch,
> ipc-timeout2.patch, ipc-timeout3.patch, rpc-to.patch
>
>
> Current RPC (really IPC) relies on client side timeouts to find "dead"
> sockets. I propose that we have a thread that once a minute (if the
> connection has been idle) writes a "ping" message to the socket. The client
> can detect a dead socket by the resulting error on the write, so no client
> side timeout is required. Also note that the ipc server does not need to
> respond to the ping, just discard it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.