[ 
https://issues.apache.org/jira/browse/HADOOP-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347132#comment-16347132
 ] 

Wei Yan commented on HADOOP-15016:
----------------------------------

Sorry, [~xyao], I missed your previous comment...
{quote}bq.  1. This can be a useful feature for multi-tenancy Hadoop cluster. 
The cost estimates for different RPC calls can be difficult. Instead of 
hardcode fixed value per RPC, I would suggest making it a pluggable interface 
so that we can customize it for different deployments.
{quote}
Agree. This cost calculation will be pluggable.
{quote}bq. 2. The reserved share of call queue looks good. It is similar what 
we proposed in HADOOP-13128. What do we plan to handle the case when the 
reserved queue is full? blocking or backoff?
{quote}
Currently I'm thinking about backoff, the same behavior like how existing 
queues handle full.
{quote}bq. 3. The feature might need many manual configurations and tune to 
work for specific deployment and workloads. Do you want to add a section to 
discuss configurations, CLI tools, etc. to make this easier to use?
{quote}
Yes. I'm looking for a mathmatical model to calculate cost for different RPC 
calls, based on historical access pattern. This could be a suggestion for users 
to use. Also, may need to build a similar simulation tool, to replay the 
historical RPC log to verify different configurations.
{quote}bq. 4. It would be great if you could share some of the results achieved 
with the POC patch (e.g., RPC/second, average locking, process and queue time 
with/wo the patch).
{quote}
Is busy with some other projects. Will put some results around next month.

> Cost-Based RPC FairCallQueue with Reservation support
> -----------------------------------------------------
>
>                 Key: HADOOP-15016
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15016
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>            Priority: Major
>         Attachments: Adding reservation support to NameNode RPC resource.pdf, 
> Adding reservation support to NameNode RPC resource_v2.pdf, 
> HADOOP-15016_poc.patch
>
>
> FairCallQueue is introduced to provide RPC resource fairness among different 
> users. In current implementation, each user is weighted equally, and the 
> processing priority for different RPC calls are based on how many requests 
> that user sent before. This works well when the cluster is shared among 
> several end-users.
> However, this has some limitations when a cluster is shared among both 
> end-users and some service jobs, like some ETL jobs which run under a service 
> account and need to issue lots of RPC calls. When NameNode becomes quite 
> busy, this set of jobs can be easily backoffed and low-prioritied. We cannot 
> simply treat this type jobs as "bad" user who randomly issues too many calls, 
> as their calls are normal calls. Also, it is unfair to weight a end-user and 
> a heavy service user equally when allocating RPC resources.
> One idea here is to introduce reservation support to RPC resources. That is, 
> for some services, we reserve some RPC resources for their calls. This idea 
> is very similar to how YARN manages CPU/memory resources among different 
> resource queues. A little more details here: Along with existing 
> FairCallQueue setup (like using 4 queues with different priorities), we would 
> add some additional special queues, one for each special service user. For 
> each special service user, we provide a guarantee RPC share (like 10% which 
> can be aligned with its YARN resource share), and this percentage can be 
> converted to a weight used in WeightedRoundRobinMultiplexer. A quick example, 
> we have 4 default queues with default weights (8, 4, 2, 1), and two special 
> service users (user1 with 10% share, and user2 with 15% share). So finally 
> we'll have 6 queues, 4 default queues (with weights 8, 4, 2, 1) and 2 special 
> queues (user1Queue weighted 15*10%/75%=2, and user2Queue weighted 
> 15*15%/75%=3).
> For new coming RPC calls from special service users, they will be put 
> directly to the corresponding reserved queue; for other calls, just follow 
> current implementation.
> By default, there is no special user and all RPC requests follow existing 
> FairCallQueue implementation.
> Would like to hear more comments on this approach; also want to know any 
> other better solutions? Will put a detailed design once get some early 
> comments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to