[jira] [Commented] (HADOOP-15016) Add reservation support to RPC FairCallQueue

Wei Yan (JIRA) Fri, 03 Nov 2017 11:28:30 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238144#comment-16238144
 ]


Wei Yan commented on HADOOP-15016:
----------------------------------

Thanks for the comments, [~xyao].

{quote}
Have you looking into RPC CallerID (HDFS-9184) that is designed to trace 
callers under different services (Yarn/Spark/Hive/Tez). You could extend a 
IdentifyProvider to leverage that and thus avoid punishing all the RPC calls 
from the same service user.
{quote}
Yes, we plan to enable HDFS-9184 to log out more detailed information for audit 
purpose. But here we would like to group calls by service users, no matter what 
engines it uses.

{quote}
Can you elaborate on how to quantify the cost of RPC calls, which are not equal 
in terms of the cost on NN? Same RPC call with different parameter may have 
significant difference in cost as well. Can you post more details of the 
proposal for discussion.
{quote}
We also looked into how to build a cost-based FairCallQueue, and have some 
early results. One rough idea now we have is to simply assign different weights 
to read/write, and some large listStatus calls, instead of tracking the 
detailed lockTime for each RPC call. As cost-based is separated from this jira, 
we'll open another ticket once we have some more results.


> Add reservation support to RPC FairCallQueue
> --------------------------------------------
>
>                 Key: HADOOP-15016
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15016
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>            Priority: Normal
>
> FairCallQueue is introduced to provide RPC resource fairness among different 
> users. In current implementation, each user is weighted equally, and the 
> processing priority for different RPC calls are based on how many requests 
> that user sent before. This works well when the cluster is shared among 
> several end-users.
> However, this has some limitations when a cluster is shared among both 
> end-users and some service jobs, like some ETL jobs which run under a service 
> account and need to issue lots of RPC calls. When NameNode becomes quite 
> busy, this set of jobs can be easily backoffed and low-prioritied. We cannot 
> simply treat this type jobs as "bad" user who randomly issues too many calls, 
> as their calls are normal calls. Also, it is unfair to weight a end-user and 
> a heavy service user equally when allocating RPC resources.
> One idea here is to introduce reservation support to RPC resources. That is, 
> for some services, we reserve some RPC resources for their calls. This idea 
> is very similar to how YARN manages CPU/memory resources among different 
> resource queues. A little more details here: Along with existing 
> FairCallQueue setup (like using 4 queues with different priorities), we would 
> add some additional special queues, one for each special service user. For 
> each special service user, we provide a guarantee RPC share (like 10% which 
> can be aligned with its YARN resource share), and this percentage can be 
> converted to a weight used in WeightedRoundRobinMultiplexer. A quick example, 
> we have 4 default queues with default weights (8, 4, 2, 1), and two special 
> service users (user1 with 10% share, and user2 with 15% share). So finally 
> we'll have 6 queues, 4 default queues (with weights 8, 4, 2, 1) and 2 special 
> queues (user1Queue weighted 15*10%/75%=2, and user2Queue weighted 
> 15*15%/75%=3).
> For new coming RPC calls from special service users, they will be put 
> directly to the corresponding reserved queue; for other calls, just follow 
> current implementation.
> By default, there is no special user and all RPC requests follow existing 
> FairCallQueue implementation.
> Would like to hear more comments on this approach; also want to know any 
> other better solutions? Will put a detailed design once get some early 
> comments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15016) Add reservation support to RPC FairCallQueue

Reply via email to