Wei Yan created HADOOP-15016:
--------------------------------
Summary: Add reservation support to RPC FairCallQueue
Key: HADOOP-15016
URL: https://issues.apache.org/jira/browse/HADOOP-15016
Project: Hadoop Common
Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Normal
FairCallQueue is introduced to provide RPC resource fairness among different
users. In current implementation, each user is weighted equally, and the
processing priority for different RPC calls are based on how many requests that
user sent before. This works well when the cluster is shared among several
end-users.
However, this has some limitations when a cluster is shared among both
end-users and some service jobs, like some ETL jobs which run under a service
account and need to issue lots of RPC calls. When NameNode becomes quite busy,
this set of jobs can be easily backoffed and low-prioritied. We cannot simply
treat this type jobs as "bad" user who randomly issues too many calls, as their
calls are normal calls. Also, it is unfair to weight a end-user and a heavy
service user equally when allocating RPC resources.
One idea here is to introduce reservation support to RPC resources. That is,
for some services, we reserve some RPC resources for their calls. This idea is
very similar to how YARN manages CPU/memory resources among different resource
queues. A little more details here: Along with existing FairCallQueue setup
(like using 4 queues with different priorities), we would add some additional
special queues, one for each special service user. For each special service
user, we provide a guarantee RPC share (like 10% which can be aligned with its
YARN resource share), and this percentage can be converted to a weight used in
WeightedRoundRobinMultiplexer. A quick example, we have 4 default queues with
default weights (8, 4, 2, 1), and two special service users (user1 with 10%
share, and user2 with 15% share). So finally we'll have 6 queues, 4 default
queues (with weights 8, 4, 2, 1) and 2 special queues (user1Queue weighted
15*10%/75%=2, and user2Queue weighted 15*15%/75%=3).
For new coming RPC calls from special service users, they will be put directly
to the corresponding reserved queue; for other calls, just follow current
implementation.
By default, there is no special user and all RPC requests follow existing
FairCallQueue implementation.
Would like to hear more comments on this approach; also want to know any other
better solutions?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]