This is an automated email from the ASF dual-hosted git repository.
gongchao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hertzbeat.git
The following commit(s) were added to refs/heads/master by this push:
new cbe38c792 [feature] add monitoring for Hbase Master (#1820)
cbe38c792 is described below
commit cbe38c79291455d5269421b889ef8815255edb06
Author: Jast <[email protected]>
AuthorDate: Tue Apr 23 19:25:43 2024 +0800
[feature] add monitoring for Hbase Master (#1820)
Co-authored-by: zhangshenghang <[email protected]>
---
home/docs/help/hbase_master.md | 62 +++++
.../current/help/hbase_master.md | 62 +++++
.../src/main/resources/define/app-hbase_master.yml | 265 +++++++++++++++++++++
3 files changed, 389 insertions(+)
diff --git a/home/docs/help/hbase_master.md b/home/docs/help/hbase_master.md
new file mode 100644
index 000000000..66bdb84c5
--- /dev/null
+++ b/home/docs/help/hbase_master.md
@@ -0,0 +1,62 @@
+---
+id: hbase_master
+title: Monitoring Hbase Master
+sidebar_label: HbaseMaster Monitoring
+keywords: [Open Source Monitoring System, Open Source Database Monitoring,
HbaseMaster Monitoring]
+---
+> Collect monitoring data for general performance metrics of Hbase Master.
+
+**Protocol: HTTP**
+
+## Pre-monitoring steps
+
+Check the `hbase-site.xml` file to obtain the value of the
`hbase.master.info.port` configuration item, which is used for monitoring.
+
+## Configuration Parameters
+
+
+| Parameter Name | Parameter Description
|
+| ------------------- |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
+| Target Host | The IPv4, IPv6, or domain name of the monitored peer.
Note: without protocol header (e.g., https://, http://).
|
+| Port | The port number of the Hbase master, default is 16010.
That is, the value of the`hbase.master.info.port` parameter.
|
+| Task Name | The name identifying this monitoring, which needs to
be unique.
|
+| Query Timeout | The timeout setting for Kafka connection, in
milliseconds, with a default of 3000 milliseconds.
|
+| Collection Interval | The periodic collection interval for monitoring data,
in seconds, with the minimum allowable interval being 30 seconds.
|
+| Probe | Whether to probe and check the availability of
monitoring before adding new monitoring, and proceed with the addition or
modification operation only if the probe is successful. |
+| Description | Additional notes and descriptions for this monitoring,
users can add notes here.
|
+
+### Collected Metrics
+
+#### Metric Set: server
+
+
+| Metric Name | Unit | Metric Description |
+| -------------------- | ---- | --------------------------------------- |
+| numRegionServers | none | Number of currently alive RegionServers |
+| numDeadRegionServers | none | Number of currently dead RegionServers |
+| averageLoad | none | Cluster average load |
+| clusterRequests | none | Total number of cluster requests |
+
+#### Metric Set: Rit
+
+
+| Metric Name | Unit | Metric Description |
+| -------------------- | ---- | -------------------------------- |
+| ritnone | none | Current number of RIT |
+| ritnoneOverThreshold | none | Number of RIT over the threshold |
+| ritOldestAge | ms | Duration of the oldest RIT |
+
+#### Metric Set: basic
+
+
+| Metric Name | Unit | Metric Description
|
+| ----------------------- | ---- | -------------------------------------------
|
+| liveRegionServers | none | List of currently active RegionServers
|
+| deadRegionServers | none | List of currently offline RegionServers
|
+| zookeeperQuorum | none | Zookeeper list
|
+| masterHostName | none | Master node
|
+| BalancerCluster_num_ops | none | Number of cluster load balancing operations
|
+| numActiveHandler | none | Number of RPC handlers
|
+| receivedBytes | MB | Cluster received data volume
|
+| sentBytes | MB | Cluster sent data volume (MB)
|
+| clusterRequests | none | Total number of cluster requests
|
diff --git
a/home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/hbase_master.md
b/home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/hbase_master.md
new file mode 100644
index 000000000..79d5a7f9b
--- /dev/null
+++
b/home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/hbase_master.md
@@ -0,0 +1,62 @@
+---
+id: hbase_master
+title: 监控:Hbase Master监控
+sidebar_label: HbaseMaster监控
+keywords: [开源监控系统, 开源数据库监控, HbaseMaster监控]
+---
+> 对Hbase Master的通用性能指标进行采集监控
+
+**使用协议:HTTP**
+
+## 监控前操作
+
+查看 `hbase-site.xml` 文件,获取 `hbase.master.info.port` 配置项的值,该值用作监控使用。
+
+## 配置参数
+
+
+| 参数名称 | 参数帮助描述
|
+| ------------ |
------------------------------------------------------------------------- |
+| 目标Host | 被监控的对端IPV4,IPV6或域名。注意⚠️不带协议头(eg: https://, http://)。 |
+| 端口 | hbase master的端口号,默认为16010。即:`hbase.master.info.port`参数值 |
+| 任务名称 | 标识此监控的名称,名称需要保证唯一性。 |
+| 查询超时时间 | 设置Kafka连接的超时时间,单位ms毫秒,默认3000毫秒。 |
+| 采集间隔 | 监控周期性采集数据间隔时间,单位秒,可设置的最小间隔为30秒 |
+| 是否探测 | 新增监控前是否先探测检查监控可用性,探测成功才会继续新增修改操作 |
+| 描述备注 | 更多标识和描述此监控的备注信息,用户可以在这里备注信息 |
+
+### 采集指标
+
+#### 指标集合:server
+
+
+| 指标名称 | 指标单位 | 指标帮助描述 |
+| -------------------- |----| ---------------------------- |
+| numRegionServers | 无 | 当前存活的 RegionServer 个数 |
+| numDeadRegionServers | 无 | 当前Dead的 RegionServer 个数 |
+| averageLoad | 无 | 集群平均负载 |
+| clusterRequests | 无 | 集群请求数量 |
+
+#### 指标集合:Rit
+
+
+| 指标名称 | 指标单位 | 指标帮助描述 |
+| --------------------- | ------ | ------------------- |
+| ritCount | 无 | 当前的 RIT 数量 |
+| ritCountOverThreshold | 无 | 超过阈值的 RIT 数量 |
+| ritOldestAge | ms | 最老的RIT的持续时间 |
+
+#### 指标集合:basic
+
+
+| 指标名称 | 指标单位 | 指标帮助描述 |
+| ----------------------- | ----- | ------------------------ |
+| liveRegionServers | 无 | 当前活跃RegionServer列表 |
+| deadRegionServers | 无 | 当前离线RegionServer列表 |
+| zookeeperQuorum | 无 | Zookeeper列表 |
+| masterHostName | 无 | Master节点 |
+| BalancerCluster_num_ops | 无 | 集群负载均衡次数 |
+| numActiveHandler | 无 | RPC句柄数 |
+| receivedBytes | MB | 集群接收数据量 |
+| sentBytes | MB | 集群发送数据量(MB) |
+| clusterRequests | 无 | 集群总请求数量 |
diff --git a/manager/src/main/resources/define/app-hbase_master.yml
b/manager/src/main/resources/define/app-hbase_master.yml
new file mode 100644
index 000000000..c0bbb4e01
--- /dev/null
+++ b/manager/src/main/resources/define/app-hbase_master.yml
@@ -0,0 +1,265 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# The monitoring type category:service-application service monitoring
db-database monitoring custom-custom monitoring os-operating system monitoring
+category: bigdata
+# The monitoring type eg: linux windows tomcat mysql aws...
+app: hbase_master
+# The monitoring i18n name
+name:
+ zh-CN: Apache Hbase Master
+ en-US: Apache Hbase Master
+# The description and help of this monitoring type
+help:
+ zh-CN: Hertzbeat 对 Hbase 数据库 Master 节点监控指标进行监控。<br>您可以点击 “<i>新建 Apache Hbase
Master</i>” 并进行配置,或者选择“<i>更多操作</i>”,导入已有配置。
+ en-US: Hertzbeat monitors the Master node monitoring indicators of the Hbase
database. <br>You can click "<i>New Apache Hbase Master</i>" to configure, or
select "<i>More Actions</i>" to import an existing configuration.
+ zh-TW: Hertzbeat 對 Hbase 數據庫 Master 节點監控指標進行監控。<br>您可以點擊 “<i>新建 Apache Hbase
Master</i>” 並進行配置,或者選擇“<i>更多操作</i>”,導入已有配置。
+
+helpLink:
+ zh-CN: https://hertzbeat.apache.org/zh-cn/docs/help/hbase_master/
+ en-US: https://hertzbeat.apache.org/docs/help/hbase_master/
+# Input params define for monitoring(render web ui by the definition)
+params:
+ # field-param field key
+ - field: host
+ # name-param field display i18n name
+ name:
+ zh-CN: 目标Host
+ en-US: Target Host
+ # type-param field type(most mapping the html input type)
+ type: host
+ # required-true or false
+ required: true
+ # field-param field key
+ - field: port
+ # name-param field display i18n name
+ name:
+ zh-CN: 端口
+ en-US: Port
+ # type-param field type(most mapping the html input type)
+ type: number
+ # when type is number, range is required
+ range: '[0,65535]'
+ # required-true or false
+ required: true
+ # default value
+ defaultValue: 16010
+ # field-param field key
+ - field: timeout
+ # name-param field display i18n name
+ name:
+ zh-CN: 查询超时时间
+ en-US: Query Timeout
+ # type-param field type(most mapping the html input type)
+ type: number
+ # required-true or false
+ required: false
+ # hide param-true or false
+ hide: true
+ # default value
+ defaultValue: 6000
+# collect metrics config list
+metrics:
+ # metrics - Server
+ - name: Server
+ # metrics scheduling priority(0->127)->(high->low), metrics with the same
priority will be scheduled in parallel
+ # priority 0's metrics is availability metrics, it will be scheduled
first, only availability metrics collect success will the scheduling continue
+ priority: 0
+ # collect metrics content
+ fields:
+ # field-metric name, type-metric type(0-number,1-string), unit-metric
unit('%','ms','MB'), label-whether it is a metrics label field
+ - field: numRegionServers
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 活跃RegionServer数量
+ en-US: numRegionServers
+ - field: numDeadRegionServers
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 异常RegionServer数量
+ en-US: numDeadRegionServers
+ - field: averageLoad
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 集群平均负载
+ en-US: averageLoad
+ - field: clusterRequests
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 集群请求数量
+ en-US: clusterRequests
+ # (optional)metrics field alias name, it is used as an alias field to map
and convert the collected data and metrics field
+ aliasFields:
+ - $.numRegionServers
+ - $.numDeadRegionServers
+ - $.averageLoad
+ - $.clusterRequests
+ calculates:
+ - numRegionServers=$.numRegionServers
+ - numDeadRegionServers=$.numDeadRegionServers
+ - averageLoad=$.averageLoad
+ - clusterRequests=$.clusterRequests
+ protocol: http
+ http:
+ host: ^_^host^_^
+ port: ^_^port^_^
+ url: /jmx
+ method: GET
+ ssl: ^_^ssl^_^
+ parseType: jsonPath
+ parseScript: '$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")]'
+ - name: Rit
+ # metrics scheduling priority(0->127)->(high->low), metrics with the same
priority will be scheduled in parallel
+ # priority 0's metrics is availability metrics, it will be scheduled
first, only availability metrics collect success will the scheduling continue
+ priority: 0
+ # collect metrics content
+ fields:
+ # field-metric name, type-metric type(0-number,1-string), unit-metric
unit('%','ms','MB'), label-whether it is a metrics label field
+ - field: ritCount
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 当前的 RIT 数量
+ en-US: ritCount
+ - field: ritCountOverThreshold
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 超过阈值的 RIT 数量
+ en-US: ritCountOverThreshold
+ - field: ritOldestAge
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 最老的RIT的持续时间
+ en-US: ritOldestAge
+ # (optional)metrics field alias name, it is used as an alias field to map
and convert the collected data and metrics field
+ aliasFields:
+ - $.ritCount
+ - $.ritCountOverThreshold
+ - $.ritOldestAge
+ calculates:
+ - ritCount=$.ritCount
+ - ritCountOverThreshold=$.ritCountOverThreshold
+ - ritOldestAge=$.ritOldestAge
+ protocol: http
+ http:
+ host: ^_^host^_^
+ port: ^_^port^_^
+ url: /jmx
+ method: GET
+ ssl: ^_^ssl^_^
+ parseType: jsonPath
+ parseScript: '$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=AssignmentManager")]'
+ - name: basic
+ # metrics scheduling priority(0->127)->(high->low), metrics with the same
priority will be scheduled in parallel
+ # priority 0's metrics is availability metrics, it will be scheduled
first, only availability metrics collect success will the scheduling continue
+ priority: 0
+ # collect metrics content
+ fields:
+ # field-metric name, type-metric type(0-number,1-string), unit-metric
unit('%','ms','MB'), label-whether it is a metrics label field
+ - field: liveRegionServers
+ type: 1
+ label: true
+ i18n:
+ zh-CN: 当前活跃RegionServer列表
+ en-US: liveRegionServers
+ - field: deadRegionServers
+ type: 1
+ label: true
+ i18n:
+ zh-CN: 当前离线RegionServer列表
+ en-US: deadRegionServers
+ - field: zookeeperQuorum
+ type: 1
+ label: true
+ i18n:
+ zh-CN: Zookeeper列表
+ en-US: zookeeperQuorum
+ - field: masterHostName
+ type: 1
+ label: true
+ i18n:
+ zh-CN: Master节点
+ en-US: masterHostName
+ - field: BalancerCluster_num_ops
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 集群负载均衡次数
+ en-US: BalancerCluster_num_ops
+ - field: numActiveHandler
+ type: 0
+ label: true
+ i18n:
+ zh-CN: RPC句柄数
+ en-US: numActiveHandler
+ - field: receivedBytes
+ type: 0
+ label: true
+ unit: 'MB'
+ i18n:
+ zh-CN: 集群接收数据量(MB)
+ en-US: receivedBytes
+ - field: sentBytes
+ type: 0
+ label: true
+ unit: 'MB'
+ i18n:
+ zh-CN: 集群发送数据量(MB)
+ en-US: sentBytes
+ - field: clusterRequests
+ type: 0
+ label: true
+ i18n:
+ zh-CN: 集群总请求数量
+ en-US: clusterRequests
+ # (optional)metrics field alias name, it is used as an alias field to map
and convert the collected data and metrics field
+ aliasFields:
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.liveRegionServers']
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.deadRegionServers']
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.zookeeperQuorum']
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.Hostname']
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Balancer")].BalancerCluster_num_ops
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=IPC")].numActiveHandler
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=IPC")].receivedBytes
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=IPC")].sentBytes
+ - $.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].clusterRequests
+ calculates:
+ - liveRegionServers=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.liveRegionServers']
+ - deadRegionServers=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.deadRegionServers']
+ - zookeeperQuorum=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.zookeeperQuorum']
+ - masterHostName=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].['tag.Hostname']
+ - BalancerCluster_num_ops=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Balancer")].BalancerCluster_num_ops
+ - numActiveHandler=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=IPC")].numActiveHandler
+ - receivedBytes=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=IPC")].receivedBytes
+ - sentBytes=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=IPC")].sentBytes
+ - clusterRequests=$.beans[?(@.name ==
"Hadoop:service=HBase,name=Master,sub=Server")].clusterRequests
+ units:
+ - receivedBytes=B->MB
+ - sentBytes=B->MB
+ protocol: http
+ http:
+ host: ^_^host^_^
+ port: ^_^port^_^
+ url: /jmx
+ method: GET
+ ssl: ^_^ssl^_^
+ parseType: jsonPath
+ parseScript: '$'
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]