Nihal Jain created HBASE-27966:
----------------------------------

             Summary: HBase Master/RS JVM metrics populated incorrectly
                 Key: HBASE-27966
                 URL: https://issues.apache.org/jira/browse/HBASE-27966
             Project: HBase
          Issue Type: Bug
          Components: metrics
    Affects Versions: 2.0.0-alpha-4
            Reporter: Nihal Jain
            Assignee: Nihal Jain


HBase Master/RS JVM metrics populated incorrectly due to regression causing 
ambari metrics system to not able to capture them.

Based on my analysis the issue is relevant for all release post 2.0.0-alpha-4 
and seems to be caused due to HBASE-18846.

Have been able to compare the JVM metrics across 3 versions of HBase and 
attaching results of same below:

HBase: 1.1.2
{code:java}
{
    "name" : "Hadoop:service=HBase,name=JvmMetrics",
    "modelerType" : "JvmMetrics",
    "tag.Context" : "jvm",
    "tag.ProcessName" : "RegionServer",
    "tag.SessionId" : "",
    "tag.Hostname" : "HOSTNAME",
    "MemNonHeapUsedM" : 196.05664,
    "MemNonHeapCommittedM" : 347.60547,
    "MemNonHeapMaxM" : 4336.0,
    "MemHeapUsedM" : 7207.315,
    "MemHeapCommittedM" : 66080.0,
    "MemHeapMaxM" : 66080.0,
    "MemMaxM" : 66080.0,
    "GcCount" : 3953,
    "GcTimeMillis" : 662520,
    "ThreadsNew" : 0,
    "ThreadsRunnable" : 214,
    "ThreadsBlocked" : 0,
    "ThreadsWaiting" : 626,
    "ThreadsTimedWaiting" : 78,
    "ThreadsTerminated" : 0,
    "LogFatal" : 0,
    "LogError" : 0,
    "LogWarn" : 0,
    "LogInfo" : 0
  },
{code}
HBase 2.0.2
{code:java}
{
    "name" : "Hadoop:service=HBase,name=JvmMetrics",
    "modelerType" : "JvmMetrics",
    "tag.Context" : "jvm",
    "tag.ProcessName" : "IO",
    "tag.SessionId" : "",
    "tag.Hostname" : "HOSTNAME",
    "MemNonHeapUsedM" : 203.86688,
    "MemNonHeapCommittedM" : 740.6953,
    "MemNonHeapMaxM" : -1.0,
    "MemHeapUsedM" : 14879.477,
    "MemHeapCommittedM" : 31744.0,
    "MemHeapMaxM" : 31744.0,
    "MemMaxM" : 31744.0,
    "GcCount" : 75922,
    "GcTimeMillis" : 5134691,
    "ThreadsNew" : 0,
    "ThreadsRunnable" : 90,
    "ThreadsBlocked" : 3,
    "ThreadsWaiting" : 158,
    "ThreadsTimedWaiting" : 36,
    "ThreadsTerminated" : 0,
    "LogFatal" : 0,
    "LogError" : 0,
    "LogWarn" : 0,
    "LogInfo" : 0
  },
{code}
HBase: 2.5.2
{code:java}
{
      "name": "Hadoop:service=HBase,name=JvmMetrics",
      "modelerType": "JvmMetrics",
      "tag.Context": "jvm",
      "tag.ProcessName": "IO",
      "tag.SessionId": "",
      "tag.Hostname": "HOSTNAME",
      "MemNonHeapUsedM": 192.9798,
      "MemNonHeapCommittedM": 198.4375,
      "MemNonHeapMaxM": -1.0,
      "MemHeapUsedM": 773.23584,
      "MemHeapCommittedM": 1004.0,
      "MemHeapMaxM": 1024.0,
      "MemMaxM": 1024.0,
      "GcCount": 2048,
      "GcTimeMillis": 25440,
      "ThreadsNew": 0,
      "ThreadsRunnable": 22,
      "ThreadsBlocked": 0,
      "ThreadsWaiting": 121,
      "ThreadsTimedWaiting": 49,
      "ThreadsTerminated": 0,
      "LogFatal": 0,
      "LogError": 0,
      "LogWarn": 0,
      "LogInfo": 0
 },
{code}
It can be observed that 2.0.x onwards the field "tag.ProcessName" is populating 
as "IO" instead of expected "RegionServer" or "Master".

Ambari relies on this field process name to create a metric 
'jvm.RegionServer.JvmMetrics.GcTimeMillis' etc. See 
[code.|https://github.com/apache/ambari/blob/2ec4b055d99ec84c902da16dd57df91d571b48d6/ambari-server/src/main/java/org/apache/ambari/server/controller/metrics/timeline/AMSPropertyProvider.java#L722]

But post 2.0.x the field is getting populated as 'IO' and hence a metric with 
name 'jvm.JvmMetrics.GcTimeMillis' is created instead of expected 
'jvm.RegionServer.JvmMetrics.GcTimeMillis', thus mixing up the metric with 
various other metrics coming from rs, master, spark executor etc. running on 
same host.

*Expected*
Field "tag.ProcessName" should be populated as "RegionServer" or "Master" 
instead of "IO".

*Actual*
Field "tag.ProcessName" is populating as "IO" instead of expected 
"RegionServer" or "Master" causing incorrect metric being published by ambari 
and thus mixing up all metrics and raising various alerts around JVM metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to