(incubator-pegasus-website) branch master updated: update monitoring doc (#120)

wangdan Thu, 06 Nov 2025 02:09:09 -0800

This is an automated email from the ASF dual-hosted git repository.

wangdan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pegasus-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 30dd8109 update monitoring doc (#120)
30dd8109 is described below

commit 30dd81090b8ed9675e6916441db11685085ead07
Author: Samunroyu <[email protected]>
AuthorDate: Thu Nov 6 18:08:55 2025 +0800

    update monitoring doc (#120)
---
 _docs/en/administration/monitoring.md | 216 +++++++++++++++++++++++++++++++++-
 _docs/zh/administration/monitoring.md |  46 ++++----
 2 files changed, 238 insertions(+), 24 deletions(-)

diff --git a/_docs/en/administration/monitoring.md 
b/_docs/en/administration/monitoring.md
index 19745e9d..0843ff1a 100644
--- a/_docs/en/administration/monitoring.md
+++ b/_docs/en/administration/monitoring.md
@@ -2,4 +2,218 @@
 permalink: administration/monitoring
 ---
 
-TRANSLATING
+## Components
+
+Since v1.12.0, Pegasus supports collecting and visualizing monitoring metrics 
using [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/).
+
+- Prometheus
+
+> Prometheus is an open-source system monitoring and alerting toolkit. It 
stores metrics from monitored systems in its own time-series database, and 
provides a rich multi-dimensional query language to meet different 
visualization needs.
+
+- Grafana
+
+> Grafana is an open-source analytics and visualization platform. It supports 
multiple mainstream time-series data sources including Prometheus. With query 
languages corresponding to each data source, Grafana retrieves and displays 
data, and through configurable dashboards, quickly presents these data in 
charts.
+
+**Note**
+
+This document only provides one approach to collect and visualize Pegasus 
monitoring data using Prometheus and Grafana. Pegasus does not include or 
maintain these components. For more details, please refer to their official 
documentation.
+
+## Configure Prometheus
+
+This section describes how to configure Prometheus.
+
+### Step 1: Configure Pegasus services
+
+By default, Pegasus does not push metrics to any external system. You need to 
modify the configuration to enable Prometheus push:
+
+```ini
+[pegasus.server]
+  perf_counter_sink = prometheus
+  prometheus_port = 9091
+```
+
+> **Note**: To verify that Prometheus push is enabled on a Pegasus node, check 
whether `http://{pegasus_host}:{prometheus_port}/metrics` is accessible.
+
+#### Using Prometheus with Onebox
+
+If you use onebox, first modify `src/server/config.min.ini` to enable 
Prometheus push. You do not need to change `prometheus_port`:
+
+```ini
+[pegasus.server]
+  perf_counter_sink = prometheus
+  prometheus_port = @PROMETHEUS_PORT@
+```
+
+In onebox mode, multiple Pegasus processes run on a single machine, so the 
Prometheus ports of replica, meta, and collector may conflict. Our current 
solution is to assign a dedicated Prometheus port to each process:
+
+- collector: 9091
+- meta: [9092, 9093, 9094...]
+- replica: [9092+{META_COUNT}, 9093+{META_COUNT}, 9094+{META_COUNT}...]
+
+For example, for a onebox cluster with 2 meta, 3 replica, and 1 collector:
+
+```sh
+./run.sh start_onebox -r 3 -m 2 -c
+```
+
+- meta1: 9092, meta2: 9093
+- replica1: 9094, replica2: 9095, replica3: 9096
+
+### Step 2: Install and run Prometheus
+
+```sh
+wget 
https://github.com/prometheus/prometheus/releases/download/v2.15.2/prometheus-2.15.2.linux-amd64.tar.gz
+tar xvfz prometheus-2.15.2.linux-amd64.tar.gz
+cd prometheus-2.15.2.linux-amd64
+```
+
+Modify `prometheus.yml` under the Prometheus directory. Example template:
+
+```yaml
+global:
+  scrape_interval: 5s
+
+scrape_configs:
+  - job_name: 'pegasus'
+    static_configs:
+      - targets: ['collector_host:9091']
+        labels:
+          group: collector
+
+      - targets: ['meta_host1:9091', 'meta_host2:9091', 'meta_host3:9091']
+        labels:
+          group: meta
+
+      - targets: ['replica_host1:9091', 'replica_host2:9091', 
'replica_host3:9091']
+        labels:
+          group: replica
+      #
+      # NOTE: Add the following lines if node exporter is deployed.
+      # - targets:
+      #     [
+      #       'node_exporter_host1:9100',
+      #       'node_exporter_host2:9100',
+      #       ...
+      #       'node_exporter_hostn:9100',
+      #     ]
+      #   labels:
+      #     group: node_exporter
+```
+
+For the onebox cluster started by `./run.sh start_onebox -r 3 -m 2 -c`, the 
actual configuration is:
+
+```yaml
+global:
+  scrape_interval: 5s
+
+scrape_configs:
+  - job_name: "pegasus"
+    static_configs:
+      - targets: ["0.0.0.0:9091"]
+        labels:
+          group: collector
+
+      - targets: ["0.0.0.0:9092", "0.0.0.0:9093"]
+        labels:
+          group: meta
+
+      - targets: ["0.0.0.0:9094", "0.0.0.0:9095", "0.0.0.0:9096"]
+        labels:
+          group: replica
+```
+
+After modifying `prometheus.yml`, start Prometheus:
+
+```sh
+./prometheus --config.file=prometheus.yml
+```
+
+Open [http://localhost:9090](http://localhost:9090). Seeing the following page 
indicates success for this step:
+
+![prometheus-server](/assets/images/prometheus-server.png)
+
+Note: To verify Prometheus configuration, check 
`http://{prometheus_host}:9090/targets` to view the scrape status of each node.
+
+In the Expression input box, type your query and click Execute to display the 
results in the Table tab. Select Graph to view the time series chart.
+
+Note
+
+1. In real operations, we often need system-level metrics such as cpu.busy, 
disk.iostat, etc. When deploying Pegasus clusters, consider deploying a node 
exporter on each machine. See: [Node 
Exporter](https://github.com/prometheus/node_exporter)
+
+2. [Alert Manager](https://github.com/prometheus/alertmanager) is Prometheus’s 
alerting component, which requires separate deployment (solution not provided 
here; refer to official docs). With Alert Manager, users can configure alert 
policies and receive notifications via email, SMS, etc.
+
+3. Currently our `prometheus.yml` uses static configuration 
(`static_configs`). Its drawback is that you must manually update the config 
when scaling. Prometheus supports multiple dynamic service discovery mechanisms 
(e.g., k8s, Consul, DNS). You can customize based on your needs. See: 
[Configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/),
 [Custom SD](https://prometheus.io/blog/2018/07/05/implementing-custom-sd/)
+
+## Configure Grafana
+
+This section describes how to configure Grafana.
+
+### Step 1: Install and run Grafana
+
+Download Grafana binary:
+
+```sh
+wget https://dl.grafana.com/oss/release/grafana-6.0.0.linux-amd64.tar.gz # if 
it fails, try adding --no-check-certificate
+tar -zxvf grafana-6.0.0.linux-amd64.tar.gz
+cd grafana-6.0.0
+```
+
+Start Grafana:
+
+```sh
+./bin/grafana-server web
+```
+
+If you see output like below, Grafana is started successfully:
+
+```Linux
+INFO[07-24|14:36:59] Starting Grafana                         logger=server 
version=6.0.0 commit=34a9a62 branch=HEAD compiled=2019-02-25T22:47:26+0800
+...
+INFO[07-24|14:37:00] HTTP Server Listen                       
logger=http.server address=0.0.0.0:3000 protocol=http subUrl= socket=
+INFO[07-24|14:37:00] cleanup of expired auth tokens done      logger=auth 
count=2
+```
+
+### Step 2: Add Prometheus as a data source
+
+1. Log in to Grafana:
+
+    ![grafana-login](/assets/images/grafana-login.png)
+
+    - Default address: [http://localhost:3000](http://localhost:3000)
+    - Default username: `admin`
+    - Default password: `admin`
+
+    Note: You can skip the Change Password step.
+
+2. Click **Configuration** in the Grafana sidebar, then **Data Sources**.
+
+3. Click **Add data source**.
+
+4. Specify the data source information:
+
+    - In **Name**, give the data source a name.
+    - In **Type**, select **Prometheus**.
+    - In **URL**, specify the IP address of Prometheus.
+    - Specify other fields as needed.
+
+5. Click **Add** to save the new data source.
+
+### Step 3: Import Pegasus Dashboard
+
+Pegasus provides a dashboard with basic monitoring. The corresponding JSON 
file: [Pegasus dashboard 
JSON](https://github.com/XiaoMi/pegasus-common/releases/download/deps/grafana-dashboard.json)
+
+After downloading the JSON, import it into Grafana:
+
+Open Grafana, click the "+" on the left, select **Import**, and go to the 
import page.
+
+![grafana-import-panel](/assets/images/grafana-import-panel-upload.png)
+
+Click the "Upload .json File" button at the top right, then select the file. 
After selection, you will see the following page:
+
+![grafana-import-panel](/assets/images/grafana-import-panel.png)
+
+Click "import" at the bottom left to complete the import and open the Pegasus 
Dashboard:
+
+![grafana-import-panel](/assets/images/grafana-dashboard-pegasus.png)
+
+The dashboard contains two rows: Pegasus-Cluster and Pegasus-Table, 
representing cluster-level and table-level monitoring. Enter the specific 
cluster name in the `cluster_name` control at the top-left to view the 
corresponding metrics of that cluster.
diff --git a/_docs/zh/administration/monitoring.md 
b/_docs/zh/administration/monitoring.md
index 3cf544ad..034aeeea 100644
--- a/_docs/zh/administration/monitoring.md
+++ b/_docs/zh/administration/monitoring.md
@@ -16,7 +16,7 @@ permalink: administration/monitoring
 
 **注意***
 
-本文档仅提供一种使用 Prometheus 和 Grafana 进行 Pegasus 
监控数据采集和展示的方式。Pegasus**不包含、不维护这些组件**。更多关于这些组件的详细介绍，请移步对应官方文档进行查阅。
+本文档仅提供一种使用 Prometheus 和 Grafana 进行 Pegasus 监控数据采集和展示的方式。Pegasus 
**不包含、不维护这些组件**。更多关于这些组件的详细介绍，请移步对应官方文档进行查阅。
 
 ## 配置 Prometheus
 
@@ -24,7 +24,7 @@ permalink: administration/monitoring
 
 ### 第一步: 配置Pegasus服务
 
-Pegasus默认不向任何外部系统推送监控信息, 你需要修改配置文件以启用prometheus推送。具体如下:
+Pegasus 默认不向任何外部系统推送监控信息, 你需要修改配置文件以启用 prometheus 推送。具体如下:
 
 ```ini
 [pegasus.server]
@@ -32,11 +32,11 @@ Pegasus默认不向任何外部系统推送监控信息, 你需要修改配置
   prometheus_port = 9091
 ```
 
-> **注**: 如果你想要测试你的Pegasus节点是否正确开启了Prometheus推送, 可以检查 
`http://{pegasus_host}:{prometheus_port}/metrics` 是否能正常访问.
+> **注**: 如果你想要测试你的 Pegasus 节点是否正确开启了 Prometheus 推送, 可以检查 
`http://{pegasus_host}:{prometheus_port}/metrics` 是否能正常访问.
 
-#### Onebox使用Prometheus
+#### Onebox 使用 Prometheus
 
-如果使用onebox, 请首先修改配置文件src/server/config.min.ini以开启Prometheus推送. 但不需要改动 
`prometheus_port`.
+如果使用 onebox, 请首先修改配置文件 src/server/config.min.ini 以开启 Prometheus 推送. 但不需要改动 
`prometheus_port`.
 
 ```ini
 [pegasus.server]
@@ -44,13 +44,13 @@ Pegasus默认不向任何外部系统推送监控信息, 你需要修改配置
   prometheus_port = @PROMETHEUS_PORT@
 ```
 
-由于onebox模式下多个Pegasus服务进程部署在一台机器上，因此各replica、meta、collector的prometheus端口存在冲突问题. 
当前我们的解决办法是对每个进程配置单独的prometheus port:
+由于 onebox 模式下多个 Pegasus 服务进程部署在一台机器上，因此各 replica、meta、collector 的 prometheus 
端口存在冲突问题. 当前我们的解决办法是对每个进程配置单独的 prometheus port:
 
 - collector : 9091
 - meta: [9092, 9093, 9094...]
 - replica: [9092+{META_COUNT}, 9093+{META_COUNT}, 9094+{META_COUNT}...]
 
-例如一个2 meta, 3 replica, 1 collector的onebox集群, 其端口对应如下:
+例如一个 2 meta, 3 replica, 1 collector 的 onebox 集群, 其端口对应如下:
 
 ```sh
 ./run.sh start_onebox -r 3 -m 2 -c
@@ -67,7 +67,7 @@ tar xvfz prometheus-2.15.2.linux-amd64.tar.gz
 cd prometheus-2.15.2.linux-amd64
 ```
 
-修改prometheus目录下的prometheus.yml文件，配置模板如下所示：
+修改 prometheus 目录下的 prometheus.yml 文件，配置模板如下所示：
 
 ```yaml
 global:
@@ -100,7 +100,7 @@ scrape_configs:
       #     group: node_exporter
 ```
 
-对于上述通过 `./run.sh start_onebox -r 3 -m 2 -c` 启动的onebox集群, 实际配置如下:
+对于上述通过 `./run.sh start_onebox -r 3 -m 2 -c` 启动的 onebox 集群, 实际配置如下:
 
 ```yaml
 global:
@@ -122,7 +122,7 @@ scrape_configs:
           group: replica
 ```
 
-修改完prometheus.yml之后，启动prometheus:
+修改完 prometheus.yml 之后，启动 prometheus:
 
 ```sh
 ./prometheus --config.file=prometheus.yml
@@ -132,17 +132,17 @@ scrape_configs:
 
 ![prometheus-server](/assets/images/prometheus-server.png)
 
-> **注**: 如果你想要测试Prometheus是否正确配置, 可以检查 `http://{prometheus_host}:9090/targets` 
查看各节点的监控上报状态.
+> **注**: 如果你想要测试 Prometheus 是否正确配置, 可以检查 
`http://{prometheus_host}:9090/targets` 查看各节点的监控上报状态.
 
-在Expression框内输入需要查找的内容，点击Excute即可在Element中展示查找到的内容，当选择Graph时可以显示该内容一段时间内数值变化情况。
+在 Expression 框内输入需要查找的内容，点击 Execute 即可在 Element 中展示查找到的内容，当选择 Graph 
时可以显示该内容一段时间内数值变化情况。
 
-***注意***
+**注意**
 
-1. 实际运维过程中, 我们通常需要获取一些机器及操作系统的监控指标, 如cpu.busy, disk.iostat等等, 
所以在部署Pegasus集群的时候，可以考虑在每一台机器上部署一个node exporter后台实例，具体可参考: [Node 
Exporter](https://github.com/prometheus/node_exporter)
+1. 实际运维过程中, 我们通常需要获取一些机器及操作系统的监控指标, 如 cpu.busy, disk.iostat 等等, 所以在部署 Pegasus 
集群的时候，可以考虑在每一台机器上部署一个 node exporter 后台实例，具体可参考: [Node 
Exporter](https://github.com/prometheus/node_exporter)
 
 2. [Alert Manager](https://github.com/prometheus/alertmanager) 为 Prometheus 
报警组件，需单独部署（暂不提供方案，可参照官方文档自行搭建）。通过 Alert Manager，用户可以配置报警策略，接收邮件、短信等报警。
 
-3. 
目前我们的prometheus.yml使用的是静态配置的方式（static_configs），其缺点是当动态扩容缩容的时候需要手动去修改该静态配置。当前Prometheus支持多种动态服务发现方式，例如k8s、consul和dns等等，用户也可以根据自己需求去定制实现。详情请参考文档：[配置文件说明](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)、[实现动态服务发现](https://prometheus.io/blog/2018/07/05/implementing-custom-sd/)
+3. 目前我们的 prometheus.yml 
使用的是静态配置的方式（static_configs），其缺点是当动态扩容缩容的时候需要手动去修改该静态配置。当前 Prometheus 
支持多种动态服务发现方式，例如 k8s、consul 和 dns 
等等，用户也可以根据自己需求去定制实现。详情请参考文档：[配置文件说明](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)、[实现动态服务发现](https://prometheus.io/blog/2018/07/05/implementing-custom-sd/)
 
 ## 配置 Grafana
 
@@ -150,7 +150,7 @@ scrape_configs:
 
 ### 第一步: 安装及运行 Grafana
 
-首先下载grafana二进制包:
+首先下载 grafana 二进制包:
 
 ```sh
 wget https://dl.grafana.com/oss/release/grafana-6.0.0.linux-amd64.tar.gz 
//如果报错，可以尝试在后面添加--no-check-certificate
@@ -158,7 +158,7 @@ tar -zxvf grafana-6.0.0.linux-amd64.tar.gz
 cd grafana-6.0.0
 ```
 
-启动Grafana
+启动 Grafana
 
 ```sh
 ./bin/grafana-server web
@@ -194,7 +194,7 @@ INFO[07-24|14:37:00] cleanup of expired auth tokens done    
  logger=auth count=
 4. 指定数据源的相关信息：
 
     - 在 **Name** 处，为数据源指定一个名称。
-    - 在 **Type** 处，选择 **Prometheus**。
+    - 在 **Type** 处，选择 Prometheus。
     - 在 **URL** 处，指定 Prometheus 的 IP 地址。
     - 根据需求指定其它字段。
 
@@ -202,11 +202,11 @@ INFO[07-24|14:37:00] cleanup of expired auth tokens done  
    logger=auth count=
 
 ### 第三步: 导入 Pegasus DashBoard
 
-目前Pegasus拥有一个DashBoard，用于提供一些基本的监控信息。其相应的json文件: [Pegasus 
json文件](https://github.com/XiaoMi/pegasus-common/releases/download/deps/grafana-dashboard.json)
+目前 Pegasus 拥有一个 DashBoard，用于提供一些基本的监控信息。其相应的 json 文件: [Pegasus json 
文件](https://github.com/XiaoMi/pegasus-common/releases/download/deps/grafana-dashboard.json)
 
-下载了json文件后，可以通过import的方式将其导入进去。其步骤如下：
+下载了 json 文件后，可以通过 import 的方式将其导入进去。其步骤如下：
 
-进入grafana，点击左边框的"+"，选择import，进入import页面
+进入 grafana，点击左边框的 "+"，选择 import，进入 import 页面 。
 
 ![grafana-import-panel](/assets/images/grafana-import-panel-upload.png)
 
@@ -214,8 +214,8 @@ INFO[07-24|14:37:00] cleanup of expired auth tokens done    
  logger=auth count=
 
 ![grafana-import-panel](/assets/images/grafana-import-panel.png)
 
-然后点击左下角的"import"按钮完成导入，并进入到Pegasus相应的DashBoard，其页面如下所示
+然后点击左下角的"import"按钮完成导入，并进入到 Pegasus 相应的 DashBoard，其页面如下所示
 
 ![grafana-import-panel](/assets/images/grafana-dashboard-pegasus.png)
 
-从图中可以看出，Pegasus的DashBoard分为两个row: 
Pegasus-Cluster和Pegasus-Table，分别代表集群级别监控和表级监控。在左上角的cluster_name后输入具体的集群名字，便可以查看该集群相应的各种监控信息。
+从图中可以看出，Pegasus 的 DashBoard 分为两个 row: Pegasus-Cluster 和 
Pegasus-Table，分别代表集群级别监控和表级监控。在左上角的 cluster_name 后输入具体的集群名字，便可以查看该集群相应的各种监控信息。


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-pegasus-website) branch master updated: update monitoring doc (#120)

Reply via email to