[jira] [Commented] (HADOOP-13128) Manage Hadoop RPC resource usage via resource coupon

Wei Yan (JIRA) Thu, 07 Sep 2017 13:58:45 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157642#comment-16157642
 ]


Wei Yan commented on HADOOP-13128:
----------------------------------

[~xyao] thanks for sharing the design. We have a very similar issue as you 
discussed in the doc and resource coupon is a very good idea. Our Hadoop 
cluster is shared among multiple different services/jobs/queries, and some 
services/jobs (ETL/ingestion) may send too many RPC calls to NN. Under current 
implementation, these jobs can be easily backoff and low-prioritied as they run 
under the same service account, and it's not straightforward to distribute 
these calls to multiple service accounts. Also, some of these jobs get 
guaranteed YARN resources, but sometimes these jobs still get delayed due to 
RPC starvation.

Instead of using resource coupon idea to manage RPC resources, we're looking 
into some more static approaches (as the number of abovementioned services/jobs 
is very small, less than 10), and trying to allocate dedicated RPC share for 
certain service users. Along with existing FairCallQueue setup (like using 10 
queues with different priorities), we would add some additional special queues, 
one for each special user. For each special user, we provide a guarantee RPC 
share (like 10% which can be aligned with its YARN resource share), and this 
percentage can be converted to a weight used in WeightedRoundRobinMultiplexer. 
A quick example, we have 4 default queues with default weights (8, 4, 2, 1), 
and two special service users (user1 with 10% share, and user2 with 15% share). 
So finally we'll have 6 queues, 4 default queues (with weights 8, 4, 2, 1) and 
2 special queues (user1Queue weighted 15*10%/75%=2, and user2Queue weighted 
15*15%/75%=3).

For new coming RPC call, we'll add one additional check. If it comes from a 
special user, it will be put into the dedicated queue reserved for that user; 
for other calls, we'll follow current count decay mechanism and put into the 
default queues.

>From the handler side, it fetches new calls from queue using the index 
>provided by WeightedRoundRobinMultiplexer.

By default, there is no special user and all RPC requests follow existing 
FairCallQueue implementation.

Would like to hear more comments on this approach; also want to know any other 
available approaches?

> Manage Hadoop RPC resource usage via resource coupon
> ----------------------------------------------------
>
>                 Key: HADOOP-13128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13128
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HADOOP-13128-Proposal-20160511.pdf
>
>
> HADOOP-9640 added RPC Fair Call Queue and HADOOP-10597 added RPC backoff to 
> ensure the fairness usage of the HDFS namenode resources. YARN, the Hadoop 
> cluster resource manager currently manages the CPU and Memory resources for 
> jobs/tasks but not the storage resources such as HDFS namenode and datanode 
> usage directly. As a result of that, a high priority Yarn Job may send too 
> many RPC requests to HDFS namenode and get demoted into low priority call 
> queues due to lack of reservation/coordination. 
> To better support multi-tenancy use cases like above, we propose to manage 
> RPC server resource usage via coupon mechanism integrated with YARN. The idea 
> is to allow YARN request HDFS storage resource coupon (e.g., namenode RPC 
> calls, datanode I/O bandwidth) from namenode on behalf of the job upon 
> submission time.  Once granted, the tasks will include the coupon identifier 
> in RPC header for the subsequent calls. HDFS namenode RPC scheduler maintains 
> the state of the coupon usage based on the scheduler policy (fairness or 
> priority) to match the RPC priority with the YARN scheduling priority.
> I will post a proposal with more detail shortly. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13128) Manage Hadoop RPC resource usage via resource coupon

Reply via email to