[jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

Ming Ma (JIRA) Thu, 31 Jul 2014 22:59:29 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081950#comment-14081950
 ]


Ming Ma commented on HADOOP-10597:
----------------------------------

Thanks, Jing and Arpit.

1. In the current implementation, RPC server only throws RetriableException 
back to client when RPC queue is full, or more specifically RPC queue is full 
for the RPC user with HADOOP-9460. So before RPC queue is full, there should be 
no difference. It might be interesting to verify "large number of connections" 
scenario. The blocking approach could hold up lots of TCP connections and thus 
other users' request can't connect.

2. The value of server defined backoff policy. So far I don't have any use case 
that requires server to specify backoff policy. So it is possible all we need 
is to have RPC server throws RetriableException without backoff policy. I put 
it there for extensibility and based on Steve's suggestion. This might still be 
useful later. What if the client doesn't honor the policy? In a controlled 
environment, we can assume a single client will use hadoop RPC client which 
enforce the policy; if we have many clients, then the backoff policy component 
in RPC server such as LinearClientBackoffPolicy can keep state and can adjust 
the backoff policy parameters.

3. How it is related to HADOOP-9640. HADOOP-9640 is quite useful. client 
backoff can be complementary to that. FairQueue currently is blocking; if a 
given RPC request's enqueue to FairQueue is blocked due to FairQueue policy, it 
will hold up TCP connection and the reader threads. If we use FairQueue 
together with client backoff, requests from some heavy load application won't 
hold up TCP connection and the reader threads; thus allow other applications' 
request to be processed more quickly. Some evaluation to compare HADOOP-9640 
with "HADOOP-9640 + client backoff" might be useful. I will follow up with 
Chris Li on that.

Is there any other scenarios? For example, we can have RPC rejects requests 
based on user id, method name or machine ip for some operational situations. 
Granted, these can also be handled at the higher layer.


> Evaluate if we can have RPC client back off when server is under heavy load
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-10597
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10597
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, 
> RPCClientBackoffDesignAndEvaluation.pdf
>
>
> Currently if an application hits NN too hard, RPC requests be in blocking 
> state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
> throw some well defined exception back to the client based on certain 
> policies when it is under heavy load; client will understand such exception 
> and do exponential back off, as another implementation of 
> RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

Reply via email to