[
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmed Hussein updated HADOOP-17346:
-----------------------------------
Attachment: HADOOP-17346-branch-3.3.001.patch
> Fair call queue is defeated by abusive service principals
> ---------------------------------------------------------
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
> Issue Type: Bug
> Components: common, ipc
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Major
> Labels: pull-request-available
> Attachments: HADOOP-17346.branch-3.2.001.patch,
> HADOOP-17346.branch-3.3.001.patch
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> [~daryn] reported that the FCQ prioritizes based on the full kerberos
> principal (ie. "user/host@realm") rather than short name (ie. "user") to
> prevent service principals like the DNs and NMs being de-prioritized since
> service principals are expected to be well behaved. Notably the DNs
> contribute a significant but important load so the intent is not to
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical
> service principals to abuse the FCQ. The gstorm/* principals are a prime
> example. Each server is spamming opens as fast as possible which ensures
> that none of the gstorm servers can be de-prioritized because each principal
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service
> principals cannot be effectively de-prioritized. The sum total of all gstorm
> load prevents other principals from surpassing the priority thresholds.
> Principals stay in the highest priority queues which allows the abusive
> principals to overflow the entire call queue for extended periods of time.
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]