[
https://issues.apache.org/jira/browse/AURORA-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894777#comment-15894777
]
Zameer Manji commented on AURORA-1899:
--------------------------------------
I support this idea, and we can put it behind a flag like what we do for
various kinds of SLA metrics.
[~StephanErb]: Consider the case where a single role/user launches 30k 10k non
prod tasks at the same time. You can observe the aggregate change in the
current metrics, but only the logs will tell you who did it.
> Expose per role metrics around Thrift activity
> ----------------------------------------------
>
> Key: AURORA-1899
> URL: https://issues.apache.org/jira/browse/AURORA-1899
> Project: Aurora
> Issue Type: Task
> Reporter: David McLaughlin
>
> It's currently pretty easy for a single client to cause havoc on an Aurora
> cluster. We triage most of these issues by grepping the Scheduler logs for
> Thrift API calls and finding patterns around role names.
> Figuring out what changed would be a lot easier if we could take the current
> Thrift API metrics and export an additional metric for each one that is
> scoped by the role.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)