[ 
https://issues.apache.org/jira/browse/HADOOP-13029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252286#comment-15252286
 ] 

Ming Ma commented on HADOOP-13029:
----------------------------------

Thanks [~daryn]! Here is the issue we had that motivates this jira, but after 
offline discussion with [~chrilisf] and team members, we feel like tuning 
FairCallQueue configs should achieve the same result.

With FariCallQueue and backoff, we don't get much complaints regarding one 
abusive user's impact on other users. The main issue we currently have is a 
heavy user's impact on datanode service rpc requests which has been increasing 
as we continue to expand our cluster size. FairCallQueue is only for client 
RPC, not for datanode RPC. There was some discussion in HADOOP-10599 about 
this. Specifically:

* A heavy user generates lots of rpc requests, but it only filled up 1/4 of the 
lowest priority sub queue. However that is enough to cause lock contention with 
DN RPC requests.
* So to have backoff kick in sooner for the heavy user, we can reduce the rpc 
sub queue length. But that will impact all rpc sub queues.
* After the call queue length reduction, if lots of light users belonging to p0 
come in at the same time, some light users will get backed off, given p0 sub 
queue is much smaller than before. Thus if it can overflow to the next queue, 
light users at least won't get backed off.

However, several configs tuning including client and service rpc handler count 
and FairCallQueue weight adjustment should be able to achieve the same result.

On a related note, if FairCallQueue is used but backoff is disabled, as 
mentioned in the description, put method will move on to the next queue until 
it lands on the last queue. It isn't clear why it can't just block on the 
corresponding sub queue instead. In other words, what is the reason overflow is 
useful for the block case, to reduce the chance that the reader threads being 
blocked? Still, it seems configs tuning can also achieve that, similar to the 
argument for the backoff case.


> Have FairCallQueue try all lower priority sub queues before backoff
> -------------------------------------------------------------------
>
>                 Key: HADOOP-13029
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13029
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>
> Currently if FairCallQueue and backoff are enabled, backoff will kick in as 
> soon as the assigned sub queue is filled up.
> {noformat}
>   /**
>    * Put and offer follow the same pattern:
>    * 1. Get the assigned priorityLevel from the call by scheduler
>    * 2. Get the nth sub-queue matching this priorityLevel
>    * 3. delegate the call to this sub-queue.
>    *
>    * But differ in how they handle overflow:
>    * - Put will move on to the next queue until it lands on the last queue
>    * - Offer does not attempt other queues on overflow
>    */
> {noformat}
> Seems it is better to try lower priority sub queues when the assigned sub 
> queue is full, just like the case when backoff is disabled. This will give 
> regular users more opportunities and allow the cluster to be configured with 
> smaller call queue length. [~chrili], [~arpitagarwal], what do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to