[ 
https://issues.apache.org/jira/browse/HADOOP-17346?focusedWorklogId=507383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-507383
 ]

ASF GitHub Bot logged work on HADOOP-17346:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Nov/20 23:35
            Start Date: 03/Nov/20 23:35
    Worklog Time Spent: 10m 
      Work Description: amahussein opened a new pull request #2431:
URL: https://github.com/apache/hadoop/pull/2431


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 507383)
    Remaining Estimate: 0h
            Time Spent: 10m

> Fair call queue is defeated by abusive service principals
> ---------------------------------------------------------
>
>                 Key: HADOOP-17346
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17346
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common, ipc
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to