[
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081950#comment-14081950
]
Ming Ma commented on HADOOP-10597:
----------------------------------
Thanks, Jing and Arpit.
1. In the current implementation, RPC server only throws RetriableException
back to client when RPC queue is full, or more specifically RPC queue is full
for the RPC user with HADOOP-9460. So before RPC queue is full, there should be
no difference. It might be interesting to verify "large number of connections"
scenario. The blocking approach could hold up lots of TCP connections and thus
other users' request can't connect.
2. The value of server defined backoff policy. So far I don't have any use case
that requires server to specify backoff policy. So it is possible all we need
is to have RPC server throws RetriableException without backoff policy. I put
it there for extensibility and based on Steve's suggestion. This might still be
useful later. What if the client doesn't honor the policy? In a controlled
environment, we can assume a single client will use hadoop RPC client which
enforce the policy; if we have many clients, then the backoff policy component
in RPC server such as LinearClientBackoffPolicy can keep state and can adjust
the backoff policy parameters.
3. How it is related to HADOOP-9640. HADOOP-9640 is quite useful. client
backoff can be complementary to that. FairQueue currently is blocking; if a
given RPC request's enqueue to FairQueue is blocked due to FairQueue policy, it
will hold up TCP connection and the reader threads. If we use FairQueue
together with client backoff, requests from some heavy load application won't
hold up TCP connection and the reader threads; thus allow other applications'
request to be processed more quickly. Some evaluation to compare HADOOP-9640
with "HADOOP-9640 + client backoff" might be useful. I will follow up with
Chris Li on that.
Is there any other scenarios? For example, we can have RPC rejects requests
based on user id, method name or machine ip for some operational situations.
Granted, these can also be handled at the higher layer.
> Evaluate if we can have RPC client back off when server is under heavy load
> ---------------------------------------------------------------------------
>
> Key: HADOOP-10597
> URL: https://issues.apache.org/jira/browse/HADOOP-10597
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Ming Ma
> Assignee: Ming Ma
> Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch,
> RPCClientBackoffDesignAndEvaluation.pdf
>
>
> Currently if an application hits NN too hard, RPC requests be in blocking
> state, assuming OS connection doesn't run out. Alternatively RPC or NN can
> throw some well defined exception back to the client based on certain
> policies when it is under heavy load; client will understand such exception
> and do exponential back off, as another implementation of
> RetryInvocationHandler.
--
This message was sent by Atlassian JIRA
(v6.2#6252)