[
https://issues.apache.org/jira/browse/HADOOP-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247729#comment-17247729
]
Janus Chow commented on HADOOP-17421:
-------------------------------------
Uploaded [^HADOOP-17421.002.patch] with the following updates:
# Added a new metric to collect all requests handled for static users. Didn't
collect the cost time for static users for current and last window because the
queue of the static users won't be changed in every decay, besides the NN's
performance kind of improved with so many static users' requests omitted.
# Updated the process of building the Map to store the static users, it is for
the deployment of reloading static users.
> Specify user's queue via configuration in FairCallQueue
> --------------------------------------------------------
>
> Key: HADOOP-17421
> URL: https://issues.apache.org/jira/browse/HADOOP-17421
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Janus Chow
> Assignee: Janus Chow
> Priority: Major
> Attachments: HADOOP-17421.001.patch, HADOOP-17421.002.patch,
> static_user_performance_test.png
>
>
> The feature of FairCallQueue helps a lot in maintaining a fair and good
> service in a multi-tenant cluster, each user is assigned to queues with
> different priority to reach this goal. But in production, we met some
> problems that the automatic assignment won't fit, the problems are as follows:
> # We have a service account that would send more NN requests, for some
> reasons, we would like to keep this user and allow this user to keep this
> volume of operations. When we deployed FairCallQueue, this service user would
> be treated as a bad user and assigned to a lower queue, causing some slowness
> on the service account.
> # We are having more Flink jobs writing checkpoints to our NN, and the
> checkpoint operations have a characteristic that they would have a
> periodically high cost on the NN with an interval of several minutes.
> FairCallQueue (with cost-based enabled) doesn't have good control of this
> kind of operations because when this kind of operations starts, the cost in
> the decay window of this user is quite low, so the user will be assigned to
> queue 0, after some windows, when the users' high cost has got the attention
> and assigned to a lower queue, the user's operations are already finished.
> For problem 1, we noticed that there is already an option mentioned in
> HADOOP-17165, but in our case, the service account isn't that important that
> we'd allow it to always be assigned to queue 0.
> To solve these problems, we'd like to raise a solution by specifying the
> queue for some static users via config. The basic design is as follows:
> * Specify the static users in config for each queue.
> * Load the mapping from the config while initializing the callqueue.
> * Check the configured queue for each user when assigning the queue.
> * The cost time of the static users would not be count in our decay
> calculation to mitigate the impacts on other normal users' costs.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]