[ 
https://issues.apache.org/jira/browse/HBASE-27966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741087#comment-17741087
 ] 

Nihal Jain edited comment on HBASE-27966 at 7/7/23 4:01 PM:
------------------------------------------------------------

*Root cause analysis*

The issue is caused due to the fact that the 
[JvmMetrics|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/source/JvmMetrics.java]
 class is a singleton and any code which causes this singleton to be 
initialized first is the one whose process name would be taken up and the same 
instance is going to stick around for the process.

In 
[HRegionServer.init()|https://github.com/apache/hbase/blob/b2e2abe64bd9f3d511b8193510fe66c76ff7854c/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L503],
 the method {{HFile.checkHFileVersion(this.conf)}} is called, which causes the 
JvmMetrics class to be called first from MetricsIO class, due to following 
static initialization in 
[HFile|https://github.com/apache/hbase/blob/b2e2abe64bd9f3d511b8193510fe66c76ff7854c/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java#L172]
 class.
{code:java}
static final MetricsIO metrics = new MetricsIO(new MetricsIOWrapperImpl()); 
{code}
This in turn calls 
[BaseSourceImpl.DefaultMetricsSystemInitializer.init|https://github.com/apache/hbase/blob/b2e2abe64bd9f3d511b8193510fe66c76ff7854c/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/metrics/BaseSourceImpl.java#L112],
 thus initializing JvmMetrics with process name "IO".

I have tried a patch which removes the call 
{{HFile.checkHFileVersion(this.conf);}} and thus delays initialization of 
JvmMetrics via above code  and verified that it indeed fixes the issue. See 
[^test_patch.txt]

*Proposed solution*
I have few proposals for a fix for this issue:
 # Explicitly initialize JvmMetrics when Master/RS starts up as one of the 
first step.
 # Use lazy initialization to delay static field initialization so that 
{{HFile.checkHFileVersion(this.conf);}} does not invoke the metric io 
initialization
 # Move the helper method of HFile class inside a util class and invoke that 
instead, as shown in patch.

Let me know which approach sounds good to others or if you have better 
suggestions. I will raise a PR with same.

CC: [~zhangduo], [~apurtell], [~ndimiduk]


was (Author: nihaljain.cs):
*Root cause analysis*

The issue is caused due to the fact that the 
[JvmMetrics|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/source/JvmMetrics.java]
 class is a singleton and any code which causes this singleton to be 
initialized first is the one whose process name would be taken up and the same 
instance is going to stick around for the process.

In 
[HRegionServer.init()|https://github.com/apache/hbase/blob/b2e2abe64bd9f3d511b8193510fe66c76ff7854c/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L503],
 the method {{HFile.checkHFileVersion(this.conf);}} is called, which causes the 
JvmMetrics class to be called first via following static initialization for 
class 
[HFile|https://github.com/apache/hbase/blob/b2e2abe64bd9f3d511b8193510fe66c76ff7854c/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java#L172]
 Which in turn calls 
[BaseSourceImpl.DefaultMetricsSystemInitializer.init|https://github.com/apache/hbase/blob/b2e2abe64bd9f3d511b8193510fe66c76ff7854c/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/metrics/BaseSourceImpl.java#L112],
 thus initializing JvmMetrics with process name "IO".
{code:java}
static final MetricsIO metrics = new MetricsIO(new MetricsIOWrapperImpl()); 
{code}
I have tried a patch which removes the call 
{{HFile.checkHFileVersion(this.conf);}} and thus delays initialization of 
JvmMetrics via above code  and verified that it indeed fixes the issue. See 
[^test_patch.txt]

*Proposed solution*
I have few proposals for a fix for this issue:
 # Explicitly initialize JvmMetrics when Master/RS starts up as one of the 
first step.
 # Use lazy initialization to delay static field initialization so that 
{{HFile.checkHFileVersion(this.conf);}} does not invoke the metric io 
initialization
 # Move the helper method of HFile class inside a util class and invoke that 
instead, as shown in patch.

Let me know which approach sounds good to others or if you have better 
suggestions. I will raise a PR with same.

CC: [~zhangduo], [~apurtell], [~ndimiduk]

> HBase Master/RS JVM metrics populated incorrectly
> -------------------------------------------------
>
>                 Key: HBASE-27966
>                 URL: https://issues.apache.org/jira/browse/HBASE-27966
>             Project: HBase
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 2.0.0-alpha-4
>            Reporter: Nihal Jain
>            Assignee: Nihal Jain
>            Priority: Major
>         Attachments: test_patch.txt
>
>
> HBase Master/RS JVM metrics populated incorrectly due to regression causing 
> ambari metrics system to not able to capture them.
> Based on my analysis the issue is relevant for all release post 2.0.0-alpha-4 
> and seems to be caused due to HBASE-18846.
> Have been able to compare the JVM metrics across 3 versions of HBase and 
> attaching results of same below:
> HBase: 1.1.2
> {code:java}
> {
>     "name" : "Hadoop:service=HBase,name=JvmMetrics",
>     "modelerType" : "JvmMetrics",
>     "tag.Context" : "jvm",
>     "tag.ProcessName" : "RegionServer",
>     "tag.SessionId" : "",
>     "tag.Hostname" : "HOSTNAME",
>     "MemNonHeapUsedM" : 196.05664,
>     "MemNonHeapCommittedM" : 347.60547,
>     "MemNonHeapMaxM" : 4336.0,
>     "MemHeapUsedM" : 7207.315,
>     "MemHeapCommittedM" : 66080.0,
>     "MemHeapMaxM" : 66080.0,
>     "MemMaxM" : 66080.0,
>     "GcCount" : 3953,
>     "GcTimeMillis" : 662520,
>     "ThreadsNew" : 0,
>     "ThreadsRunnable" : 214,
>     "ThreadsBlocked" : 0,
>     "ThreadsWaiting" : 626,
>     "ThreadsTimedWaiting" : 78,
>     "ThreadsTerminated" : 0,
>     "LogFatal" : 0,
>     "LogError" : 0,
>     "LogWarn" : 0,
>     "LogInfo" : 0
>   },
> {code}
> HBase 2.0.2
> {code:java}
> {
>     "name" : "Hadoop:service=HBase,name=JvmMetrics",
>     "modelerType" : "JvmMetrics",
>     "tag.Context" : "jvm",
>     "tag.ProcessName" : "IO",
>     "tag.SessionId" : "",
>     "tag.Hostname" : "HOSTNAME",
>     "MemNonHeapUsedM" : 203.86688,
>     "MemNonHeapCommittedM" : 740.6953,
>     "MemNonHeapMaxM" : -1.0,
>     "MemHeapUsedM" : 14879.477,
>     "MemHeapCommittedM" : 31744.0,
>     "MemHeapMaxM" : 31744.0,
>     "MemMaxM" : 31744.0,
>     "GcCount" : 75922,
>     "GcTimeMillis" : 5134691,
>     "ThreadsNew" : 0,
>     "ThreadsRunnable" : 90,
>     "ThreadsBlocked" : 3,
>     "ThreadsWaiting" : 158,
>     "ThreadsTimedWaiting" : 36,
>     "ThreadsTerminated" : 0,
>     "LogFatal" : 0,
>     "LogError" : 0,
>     "LogWarn" : 0,
>     "LogInfo" : 0
>   },
> {code}
> HBase: 2.5.2
> {code:java}
> {
>       "name": "Hadoop:service=HBase,name=JvmMetrics",
>       "modelerType": "JvmMetrics",
>       "tag.Context": "jvm",
>       "tag.ProcessName": "IO",
>       "tag.SessionId": "",
>       "tag.Hostname": "HOSTNAME",
>       "MemNonHeapUsedM": 192.9798,
>       "MemNonHeapCommittedM": 198.4375,
>       "MemNonHeapMaxM": -1.0,
>       "MemHeapUsedM": 773.23584,
>       "MemHeapCommittedM": 1004.0,
>       "MemHeapMaxM": 1024.0,
>       "MemMaxM": 1024.0,
>       "GcCount": 2048,
>       "GcTimeMillis": 25440,
>       "ThreadsNew": 0,
>       "ThreadsRunnable": 22,
>       "ThreadsBlocked": 0,
>       "ThreadsWaiting": 121,
>       "ThreadsTimedWaiting": 49,
>       "ThreadsTerminated": 0,
>       "LogFatal": 0,
>       "LogError": 0,
>       "LogWarn": 0,
>       "LogInfo": 0
>  },
> {code}
> It can be observed that 2.0.x onwards the field "tag.ProcessName" is 
> populating as "IO" instead of expected "RegionServer" or "Master".
> Ambari relies on this field process name to create a metric 
> 'jvm.RegionServer.JvmMetrics.GcTimeMillis' etc. See 
> [code.|https://github.com/apache/ambari/blob/2ec4b055d99ec84c902da16dd57df91d571b48d6/ambari-server/src/main/java/org/apache/ambari/server/controller/metrics/timeline/AMSPropertyProvider.java#L722]
> But post 2.0.x the field is getting populated as 'IO' and hence a metric with 
> name 'jvm.JvmMetrics.GcTimeMillis' is created instead of expected 
> 'jvm.RegionServer.JvmMetrics.GcTimeMillis', thus mixing up the metric with 
> various other metrics coming from rs, master, spark executor etc. running on 
> same host.
> *Expected*
> Field "tag.ProcessName" should be populated as "RegionServer" or "Master" 
> instead of "IO".
> *Actual*
> Field "tag.ProcessName" is populating as "IO" instead of expected 
> "RegionServer" or "Master" causing incorrect metric being published by ambari 
> and thus mixing up all metrics and raising various alerts around JVM metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to