[ 
https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Li updated HADOOP-10281:
------------------------------

    Attachment: HADOOP-10281.patch

Hi Arpit,

I agree on #1; I've had this idea sitting around in my head for awhile too, of 
having just a map of counts that decays every few seconds. I will try this out 
and see how performance compares.

HADOOP-10286 allows performance to be measured for different user loads in the 
RPCCallBenchmark, so I will use that to benchmark perf hit.

Other:
{quote} If there is not enough traffic to flush the callHistory, sporadic users 
could end up with counts greater than the heartbeats {quote}

What can happen is, if I'm the only user on the cluster and I make a call every 
second, I shouldn't be punished because I'm not hitting the NN hard. However, 
over 1000 seconds, I will fill up the callHistory, thus being placed in low 
priority. Heartbeats from datanodes will be placed in a higher queue than me, 
and now I'm being punished. 

This issue is fixed if we add a decay constant and make it dependent on time.

{quote} I also think we are providing too many configuration knobs with this 
feature. Hadoop performance tuning is quite complex already and additional 
settings add administrator and debugging overhead{quote}

Agreed that this is hard to tune, it's still pretty experimental, so we want to 
be able to adjust these things. From what I understand, [~mingma] has modified 
the scheduler to make configuration extremely easy for the administrator, who 
specifies a target queue utilization. The scheduler and mux work together to 
hit this target.

{quote} is there any analysis behind choosing this bisection approach?" {quote}

Sampling from real-world performance in our clusters shows that the usage tiers 
between different users is exponential, and so the log(usage) is linear. Some 
of this data is on the 5th slide here: 
https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf
 (may be hard to see in pie graph form)

However we've left this configurable since different clusters may have 
different usage.

{quote} "Why do we support multiple identity providers if we discard all but 
the first?" {quote}

I found a method `conf.getInstances()` but no `conf.getInstance()`, not sure if 
it warrants patching Configuration.java

I've uploaded a new version of the patch with code style fixes, but I'm going 
to work on using a counting+decay approach for the next patch.

> Create a scheduler, which assigns schedulables a priority level
> ---------------------------------------------------------------
>
>                 Key: HADOOP-10281
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10281
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Chris Li
>            Assignee: Chris Li
>         Attachments: HADOOP-10281.patch, HADOOP-10281.patch, 
> HADOOP-10281.patch
>
>
> The Scheduler decides which sub-queue to assign a given Call. It implements a 
> single method getPriorityLevel(Schedulable call) which returns an integer 
> corresponding to the subqueue the FairCallQueue should place the call in.
> The HistoryRpcScheduler is one such implementation which uses the username of 
> each call and determines what % of calls in recent history were made by this 
> user.
> It is configured with a historyLength (how many calls to track) and a list of 
> integer thresholds which determine the boundaries between priority levels.
> For instance, if the scheduler has a historyLength of 8; and priority 
> thresholds of 4,2,1; and saw calls made by these users in order:
> Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice
> * Another call by Alice would be placed in queue 3, since she has already 
> made >= 4 calls
> * Another call by Bob would be placed in queue 2, since he has >= 2 but less 
> than 4 calls
> * A call by Carlos would be placed in queue 0, since he has no calls in the 
> history
> Also, some versions of this patch include the concept of a 'service user', 
> which is a user that is always scheduled high-priority. Currently this seems 
> redundant and will probably be removed in later patches, since its not too 
> useful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to