[
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236433#comment-17236433
]
Eric Payne commented on HADOOP-17346:
-------------------------------------
The patch for branch-3.3 changes the signature for
{{DecayRpcScheduler#computePriorityLevel}} to add an identity object:
{code:java}
- private int computePriorityLevel(long cost) {
+ private int computePriorityLevel(long cost, Object identity) {
{code}
This signature change was done in trunk as part of HADOOP-17165. Since this fix
needs that same signature change, we are faced with the following choices:
1) Backport HADOOP-17165 to branch-3.3
2) Make the same change in this patch.
I don't like to implement solution 2 because it makes changes hard to track.
However, I don't know if we want the feature from HADOOP-17165 backported to
branch-3.3.
[~tasanuma], is the service-user feature something we want backported to
earlier branches? We would probably want it pulled back into at least
branch-3.2. It backports cleanly to branch-3.3, but not quite cleanly to 3.2.
> Fair call queue is defeated by abusive service principals
> ---------------------------------------------------------
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
> Issue Type: Bug
> Components: common, ipc
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Major
> Labels: pull-request-available
> Attachments: HADOOP-17346-branch-3.1.001.patch,
> HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> [~daryn] reported that the FCQ prioritizes based on the full kerberos
> principal (ie. "user/host@realm") rather than short name (ie. "user") to
> prevent service principals like the DNs and NMs being de-prioritized since
> service principals are expected to be well behaved. Notably the DNs
> contribute a significant but important load so the intent is not to
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical
> service principals to abuse the FCQ. The gstorm/* principals are a prime
> example. Each server is spamming opens as fast as possible which ensures
> that none of the gstorm servers can be de-prioritized because each principal
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service
> principals cannot be effectively de-prioritized. The sum total of all gstorm
> load prevents other principals from surpassing the priority thresholds.
> Principals stay in the highest priority queues which allows the abusive
> principals to overflow the entire call queue for extended periods of time.
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]