[jira] [Commented] (HADOOP-14960) Add GC time percentage monitor/alerter

Xiao Chen (JIRA) Wed, 08 Nov 2017 22:35:22 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245274#comment-16245274
 ]


Xiao Chen commented on HADOOP-14960:
------------------------------------

Thanks Misha for the new changes by incorporating the timestamp and gctime into 
a class, and using a {{GcData}} class to handle update atomicity, looks pretty 
good!

Please fix the checkstyle warnings. While you're at it, I have a few minor 
comments :)
- Can we also {{setName}} on the {{GcTimeMonitor}} class, for better 
debuggability?
- Let's add a precondition check on {{bufSize}} too, to make sure we don't 
allocate crazy sizes here (say, 1M?)
- trivial Javadoc comments:
{{put a limit on a number of GCTimeMonitor instances}} s/a number/the number/g
{{@param observationWindowMs a period until now, over which the percentage}} 
s/a period until now, over which/the interval over which/
- We usually use javadoc comment style on the ASF license class header. Could 
you update {{GcTimeMonitor}}'s first line from {{/\*}} to {{/\*\*}}?

> Add GC time percentage monitor/alerter
> --------------------------------------
>
>                 Key: HADOOP-14960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14960
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: HADOOP-14960.01.patch, HADOOP-14960.02.patch, 
> HADOOP-14960.03.patch
>
>
> Currently class {{org.apache.hadoop.metrics2.source.JvmMetrics}} provides 
> several metrics related to GC. Unfortunately, all these metrics are not as 
> useful as they could be, because they don't answer the first and most 
> important question related to GC and JVM health: what percentage of time my 
> JVM is paused in GC? This percentage, calculated as the sum of the GC pauses 
> over some period, like 1 minute, divided by that period - is the most 
> convenient measure of the GC health because:
> - it is just one number, and it's clear that, say, 1..5% is good, but 80..90% 
> is really bad
> - it allows for easy apple-to-apple comparison between runs, even between 
> different apps
> - when this metric reaches some critical value like 70%, it almost always 
> indicates a "GC death spiral", from which the app can recover only if it 
> drops some task(s) etc.
> The existing "total GC time", "total number of GCs" etc. metrics only give 
> numbers that can be used to rougly estimate this percentage. Thus it is 
> suggested to add a new metric to this class, and possibly allow users to 
> register handlers that will be automatically invoked if this metric reaches 
> the specified threshold.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-14960) Add GC time percentage monitor/alerter

Reply via email to