[
https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315460#comment-14315460
]
Sanjay Radia commented on HADOOP-11552:
---------------------------------------
bq. Are you proposing to keep the TCP session open, but reuse the handler
thread for something else, while the RPC is progressing?
bq. Yes, the intent is to keep the TPC session open and re-use the handlers
Note our RPC system forces the handler thread to do the response and hence we
have to have a large number of handler threads since some of the requests (such
a write operation on a NN) takes a longer because it has to write to the
journal. Other RPC system and also request-response message passing systems
allow hand-off to any thread to do the work and reply. The TCP connection being
kept open is not due to the handler thread-binding, but it is instead because
our RCP layer depends on a connection close to detect server failures (and i
believe we send some heartbeat bytes to detect server failures promptly). So we
need to keep the connection open if the RPC is operation is not completed.
Now the impact on RCP connections that you raised:
* for normal end-clients (e.g. HDFS clients) the connections will remain open
as in the original case - ie the till the request is completed and reply is
sent. Hence the number of such connections will be the same.
* for internal clients where the request is of type "do you have more work for
me" (as sent by DN or NM) the number of connections will increase but will be
bounded. Here we can have a hybrid approach where the the RM could keep a few
requests blocked and reply only when work is available and for other such
requests it could say "no work, but try 2 seconds later".
> Allow handoff on the server side for RPC requests
> -------------------------------------------------
>
> Key: HADOOP-11552
> URL: https://issues.apache.org/jira/browse/HADOOP-11552
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ipc
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: HADOOP-11552.1.wip.txt
>
>
> An RPC server handler thread is tied up for each incoming RPC request. This
> isn't ideal, since this essentially implies that RPC operations should be
> short lived, and most operations which could take time end up falling back to
> a polling mechanism.
> Some use cases where this is useful.
> - YARN submitApplication - which currently submits, followed by a poll to
> check if the application is accepted while the submit operation is written
> out to storage. This can be collapsed into a single call.
> - YARN allocate - requests and allocations use the same protocol. New
> allocations are received via polling.
> The allocate protocol could be split into a request/heartbeat along with a
> 'awaitResponse'. The request/heartbeat is sent only when there's a request or
> on a much longer heartbeat interval. awaitResponse is always left active with
> the RM - and returns the moment something is available.
> MapReduce/Tez task to AM communication is another example of this pattern.
> The same pattern of splitting calls can be used for other protocols as well.
> This should serve to improve latency, as well as reduce network traffic since
> the keep-alive heartbeat can be sent less frequently.
> I believe there's some cases in HDFS as well, where the DN gets told to
> perform some operations when they heartbeat into the NN.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)