[
https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Li updated HADOOP-10281:
------------------------------
Attachment: HADOOP-10281.patch
Hi Arpit,
I agree on #1; I've had this idea sitting around in my head for awhile too, of
having just a map of counts that decays every few seconds. I will try this out
and see how performance compares.
HADOOP-10286 allows performance to be measured for different user loads in the
RPCCallBenchmark, so I will use that to benchmark perf hit.
Other:
{quote} If there is not enough traffic to flush the callHistory, sporadic users
could end up with counts greater than the heartbeats {quote}
What can happen is, if I'm the only user on the cluster and I make a call every
second, I shouldn't be punished because I'm not hitting the NN hard. However,
over 1000 seconds, I will fill up the callHistory, thus being placed in low
priority. Heartbeats from datanodes will be placed in a higher queue than me,
and now I'm being punished.
This issue is fixed if we add a decay constant and make it dependent on time.
{quote} I also think we are providing too many configuration knobs with this
feature. Hadoop performance tuning is quite complex already and additional
settings add administrator and debugging overhead{quote}
Agreed that this is hard to tune, it's still pretty experimental, so we want to
be able to adjust these things. From what I understand, [~mingma] has modified
the scheduler to make configuration extremely easy for the administrator, who
specifies a target queue utilization. The scheduler and mux work together to
hit this target.
{quote} is there any analysis behind choosing this bisection approach?" {quote}
Sampling from real-world performance in our clusters shows that the usage tiers
between different users is exponential, and so the log(usage) is linear. Some
of this data is on the 5th slide here:
https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf
(may be hard to see in pie graph form)
However we've left this configurable since different clusters may have
different usage.
{quote} "Why do we support multiple identity providers if we discard all but
the first?" {quote}
I found a method `conf.getInstances()` but no `conf.getInstance()`, not sure if
it warrants patching Configuration.java
I've uploaded a new version of the patch with code style fixes, but I'm going
to work on using a counting+decay approach for the next patch.
> Create a scheduler, which assigns schedulables a priority level
> ---------------------------------------------------------------
>
> Key: HADOOP-10281
> URL: https://issues.apache.org/jira/browse/HADOOP-10281
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Chris Li
> Assignee: Chris Li
> Attachments: HADOOP-10281.patch, HADOOP-10281.patch,
> HADOOP-10281.patch
>
>
> The Scheduler decides which sub-queue to assign a given Call. It implements a
> single method getPriorityLevel(Schedulable call) which returns an integer
> corresponding to the subqueue the FairCallQueue should place the call in.
> The HistoryRpcScheduler is one such implementation which uses the username of
> each call and determines what % of calls in recent history were made by this
> user.
> It is configured with a historyLength (how many calls to track) and a list of
> integer thresholds which determine the boundaries between priority levels.
> For instance, if the scheduler has a historyLength of 8; and priority
> thresholds of 4,2,1; and saw calls made by these users in order:
> Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice
> * Another call by Alice would be placed in queue 3, since she has already
> made >= 4 calls
> * Another call by Bob would be placed in queue 2, since he has >= 2 but less
> than 4 calls
> * A call by Carlos would be placed in queue 0, since he has no calls in the
> history
> Also, some versions of this patch include the concept of a 'service user',
> which is a user that is always scheduled high-priority. Currently this seems
> redundant and will probably be removed in later patches, since its not too
> useful.
--
This message was sent by Atlassian JIRA
(v6.2#6252)