[ https://issues.apache.org/jira/browse/AMBARI-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Sekhon updated AMBARI-24244: --------------------------------- Summary: Grafana HBase GC Time graph wrong / misleading - hiding large GC pauses ~ 2 dozen secs! (was: Grafana HBase GC Time graph wrong / misleading - hiding large GC pauses) > Grafana HBase GC Time graph wrong / misleading - hiding large GC pauses ~ 2 > dozen secs! > --------------------------------------------------------------------------------------- > > Key: AMBARI-24244 > URL: https://issues.apache.org/jira/browse/AMBARI-24244 > Project: Ambari > Issue Type: Bug > Components: ambari-metrics, metrics > Affects Versions: 2.5.2 > Reporter: Hari Sekhon > Priority: Major > > Ambari's in-built Grafana graph for "JVM GC Times" graph in the HBase - > RegionServers dashboard is very wrong and doesn't reflect the times I've > grepped across HBase RegionServer logs for util.JvmPauseMonitor. > I've inherited a very heavily loaded HBase + OpenTSDB cluster where there are > RegionServer losses occurring due to GCs around 30 seconds(!) causing ZK + > HMaster to declare them dead. The Grafana graphs show peaks around 70ms due > to averaging the GC time spent over all seconds, which smooths out the peaks > so as to not show any problem. If you are going to use GCTimeMillis then I > believe you need to divide by GCCount. > Otherwise I believe this is actually the wrong metric to be watching and > instead the following metric from HBase JMX should be monitored with a value > of last. This does show the significant GC time spent: > {code:java} > java.lang:type=GarbageCollector,name=G1 Old Generation -> LastGcInfo -> > duration{code} > Obviously make it search for a regex to match whichever garbage collector you > are using, whether G1 or CMS etc: > {code:java} > java.lang:type=GarbageCollector,name=.*Old Gen.* -> LastGcInfo -> > duration{code} > Right now the GC Times graph is worse than useless, it's misleading as it > implies there are no GC issues when there are actually very large very severe > GC issues on this cluster. > This is a vanilla Ambari deployed Grafana with Ambari Metrics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)