[
https://issues.apache.org/jira/browse/HADOOP-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347132#comment-16347132
]
Wei Yan commented on HADOOP-15016:
----------------------------------
Sorry, [~xyao], I missed your previous comment...
{quote}bq. 1. This can be a useful feature for multi-tenancy Hadoop cluster.
The cost estimates for different RPC calls can be difficult. Instead of
hardcode fixed value per RPC, I would suggest making it a pluggable interface
so that we can customize it for different deployments.
{quote}
Agree. This cost calculation will be pluggable.
{quote}bq. 2. The reserved share of call queue looks good. It is similar what
we proposed in HADOOP-13128. What do we plan to handle the case when the
reserved queue is full? blocking or backoff?
{quote}
Currently I'm thinking about backoff, the same behavior like how existing
queues handle full.
{quote}bq. 3. The feature might need many manual configurations and tune to
work for specific deployment and workloads. Do you want to add a section to
discuss configurations, CLI tools, etc. to make this easier to use?
{quote}
Yes. I'm looking for a mathmatical model to calculate cost for different RPC
calls, based on historical access pattern. This could be a suggestion for users
to use. Also, may need to build a similar simulation tool, to replay the
historical RPC log to verify different configurations.
{quote}bq. 4. It would be great if you could share some of the results achieved
with the POC patch (e.g., RPC/second, average locking, process and queue time
with/wo the patch).
{quote}
Is busy with some other projects. Will put some results around next month.
> Cost-Based RPC FairCallQueue with Reservation support
> -----------------------------------------------------
>
> Key: HADOOP-15016
> URL: https://issues.apache.org/jira/browse/HADOOP-15016
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Wei Yan
> Assignee: Wei Yan
> Priority: Major
> Attachments: Adding reservation support to NameNode RPC resource.pdf,
> Adding reservation support to NameNode RPC resource_v2.pdf,
> HADOOP-15016_poc.patch
>
>
> FairCallQueue is introduced to provide RPC resource fairness among different
> users. In current implementation, each user is weighted equally, and the
> processing priority for different RPC calls are based on how many requests
> that user sent before. This works well when the cluster is shared among
> several end-users.
> However, this has some limitations when a cluster is shared among both
> end-users and some service jobs, like some ETL jobs which run under a service
> account and need to issue lots of RPC calls. When NameNode becomes quite
> busy, this set of jobs can be easily backoffed and low-prioritied. We cannot
> simply treat this type jobs as "bad" user who randomly issues too many calls,
> as their calls are normal calls. Also, it is unfair to weight a end-user and
> a heavy service user equally when allocating RPC resources.
> One idea here is to introduce reservation support to RPC resources. That is,
> for some services, we reserve some RPC resources for their calls. This idea
> is very similar to how YARN manages CPU/memory resources among different
> resource queues. A little more details here: Along with existing
> FairCallQueue setup (like using 4 queues with different priorities), we would
> add some additional special queues, one for each special service user. For
> each special service user, we provide a guarantee RPC share (like 10% which
> can be aligned with its YARN resource share), and this percentage can be
> converted to a weight used in WeightedRoundRobinMultiplexer. A quick example,
> we have 4 default queues with default weights (8, 4, 2, 1), and two special
> service users (user1 with 10% share, and user2 with 15% share). So finally
> we'll have 6 queues, 4 default queues (with weights 8, 4, 2, 1) and 2 special
> queues (user1Queue weighted 15*10%/75%=2, and user2Queue weighted
> 15*15%/75%=3).
> For new coming RPC calls from special service users, they will be put
> directly to the corresponding reserved queue; for other calls, just follow
> current implementation.
> By default, there is no special user and all RPC requests follow existing
> FairCallQueue implementation.
> Would like to hear more comments on this approach; also want to know any
> other better solutions? Will put a detailed design once get some early
> comments.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]