This is an automated email from the ASF dual-hosted git repository.
wusheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking-website.git
The following commit(s) were added to refs/heads/master by this push:
new 0c7729b61e6 Add Async Profiler doc (#759)
0c7729b61e6 is described below
commit 0c7729b61e650a56d5a93950ffbdff08cdcc28c5
Author: zhengziyi0117 <[email protected]>
AuthorDate: Tue Dec 10 09:08:27 2024 +0800
Add Async Profiler doc (#759)
---
.../2024-12-09-skywalking-async-profiler/arch.jpg | Bin 0 -> 120968 bytes
.../create_task.jpg | Bin 0 -> 87240 bytes
.../facade.jpg | Bin 0 -> 79941 bytes
.../2024-12-09-skywalking-async-profiler/index.md | 124 +++++++++++++++++++++
.../performance.jpg | Bin 0 -> 460340 bytes
.../progress.jpg | Bin 0 -> 607978 bytes
.../2024-12-09-skywalking-async-profiler/arch.jpg | Bin 0 -> 120619 bytes
.../create_task.jpg | Bin 0 -> 97019 bytes
.../facade.jpg | Bin 0 -> 197020 bytes
.../2024-12-09-skywalking-async-profiler/index.md | 122 ++++++++++++++++++++
.../performance.jpg | Bin 0 -> 375661 bytes
.../progress.jpg | Bin 0 -> 605463 bytes
12 files changed, 246 insertions(+)
diff --git a/content/blog/2024-12-09-skywalking-async-profiler/arch.jpg
b/content/blog/2024-12-09-skywalking-async-profiler/arch.jpg
new file mode 100644
index 00000000000..dae72c3da38
Binary files /dev/null and
b/content/blog/2024-12-09-skywalking-async-profiler/arch.jpg differ
diff --git a/content/blog/2024-12-09-skywalking-async-profiler/create_task.jpg
b/content/blog/2024-12-09-skywalking-async-profiler/create_task.jpg
new file mode 100644
index 00000000000..6c23af4aae6
Binary files /dev/null and
b/content/blog/2024-12-09-skywalking-async-profiler/create_task.jpg differ
diff --git a/content/blog/2024-12-09-skywalking-async-profiler/facade.jpg
b/content/blog/2024-12-09-skywalking-async-profiler/facade.jpg
new file mode 100644
index 00000000000..393cbeba6fb
Binary files /dev/null and
b/content/blog/2024-12-09-skywalking-async-profiler/facade.jpg differ
diff --git a/content/blog/2024-12-09-skywalking-async-profiler/index.md
b/content/blog/2024-12-09-skywalking-async-profiler/index.md
new file mode 100644
index 00000000000..07a08281b31
--- /dev/null
+++ b/content/blog/2024-12-09-skywalking-async-profiler/index.md
@@ -0,0 +1,124 @@
+---
+title: "Profiling Java application with SkyWalking bundled async-profiler"
+date: 2024-12-09
+author: "zhengziyi0117"
+description: "This document presents an introduction to and usage of the
async-profiler in SkyWalking."
+---
+
+## Background
+
+[Apache SkyWalking](https://skywalking.apache.org/) is an open-source
Application Performance Management system that helps users gather logs, traces,
metrics, and events from various platforms and display them on the UI.
+In version 10.1.0, Apache SkyWalking can perform CPU analysis through eBPF,
which supports multiple languages, but not Java. This article discusses how
Apache SkyWalking 10.2.0 uses async-profiler to collect CPU, memory allocation,
and locks for analysis, solving this limitation, and also provides memory
allocation and occupancy analysis.
+
+## Why use async-profiler
+
+The async-profiler is a low overhead sampling profiler for Java that does not
suffer from the [Safepoint bias
problem](http://psy-lob-saw.blogspot.ru/2016/02/why-most-sampling-java-profilers-are.html).
It features HotSpot-specific API to collect stack traces and to track memory
allocations. The profiler works with OpenJDK and other Java runtimes based on
the HotSpot JVM. The async-profiler also officially supports the instruction
set architectures commonly used on Linux and Mac platforms [...]
+
+## Architecture diagram
+
+
+
+### The processes of running a profiling task
+
+1. A user submits a async-profiler task in the UI
+2. The Java agent retrieves the task from the OAP Server
+3. Java agent excuses the task to collect profiling data sampling through
async-profiler
+4. After the profiling is completed, the agent uploads the JFR file to the OAP
server.
+5. The server parses the JFR file to generate profiling results and marks the
task as completed status.
+6. The user could check the performance analysis result from the UI side.
+
+## Demo
+
+You can setup SkyWalking showcase locally to preview this feature. In this
demo, we only deploy service, the latest released SkyWalking OAP, and UI.
+
+```sh
+export FEATURE_FLAGS=java-agent-injector,single-node,elasticsearch
+make deploy.kubernetes
+```
+
+After deployment is complete, please run the following script to open
SkyWalking UI: http://localhost:8080/.
+
+```sh
+kubectl port-forward svc/ui 8080:8080 --namespace default
+```
+
+### Run the Async Profiling Task Step by Step
+
+After the deployment is complete, users can navigate to the service page where
the Java agent is configured. Upon entering the service page, users will be
able to see the `Async Profiling` component. By clicking on this component,
users will gain access to the relevant functionality page, where they can
perform some operations.
+
+
+
+### Create a New Task
+
+Clicking **New Task** on the **Async Profiling** page will direct you to the
following configuration page. The usage of each parameter is explained as
follows:
+
+- **Instance**: This parameter allows you to select the instance of the
service that will execute the profiling. It supports selecting multiple
instances simultaneously for performance analysis.
+- **Duration**: Specifies the duration for the task. The default duration is
conservatively set to a maximum of 20 minutes, but this can be adjusted through
the [Java agent
configuration]((https://github.com/apache/skywalking-java/blob/7e200bbbb052f0e03e5b2db09e1b0a4c6cf1d71c/apm-sniffer/config/agent.config#L170)).
+- **Async Profiling Events**: The profiling events are categorized into three
types of sampling, which will be explained below:
+ - **CPU Sampling**: CPU, WALL, CTIMER, ITIMER. [See the differences between
these four CPU sampling
types](#Differences-in-CPU-sampling-during-task-creation).
+ - **Memory Allocation Sampling**: ALLOC.
+ - **Lock Occupancy Sampling**: LOCK.
+- **ExecArgs**: Extended parameters for **async-profiler**. Detailed [usage
instructions](#ExecArgs-in-task-creation) are available.
+
+
+
+### Check the Progresses Of the Task
+
+By clicking the task details icon, users can view the **task status logs,
relevant parameters, as well as instances where data collection has either
failed or been successfully completed**. Instances that have successfully
completed data collection will be available for subsequent performance analysis.
+
+> It is important to note that, in containerized deployments where users have
not configured volume mounts, there may be cases where JFR files cannot be
received. To address this, the OAP Server by default uses memory to receive and
parse JFR files. The maximum acceptable size for JFR files is conservatively
set to 30MB by default.
+>
+> Users can customize the default JFR file size in the OAP configuration and
opt to store the files on the filesystem before parsing them, enabling the
platform to handle larger JFR files and ensuring smoother memory allocation.
+>
+> Currently, the JFR parser requires approximately 1GB of memory to process a
200MB JFR file. (Note that this refers only to memory allocation, not the
actual memory required for parsing.) Users can use this as a reference when
configuring their OAP Server
+
+
+
+### Performance Analysis
+
+Users can select a task and choose the instances they wish to analyze for
performance (multiple instances can be selected for aggregated flame graph
analysis). After selecting the desired JFR event type for analysis, users can
click the **Analyze** button to display the corresponding flame graph.
+
+
+
+## Some Details
+
+### Differences in CPU sampling during task creation
+
+The CPU sampling mechanism supports several modes, each representing a
different sampling engine implemented by async-profiler. These modes include
CPU, WALL, CTIMER, and ITIMER, and differ primarily in how they collect and
generate sampling signals. The following provides a detailed description of
each sampling:
+
+- **CPU**: cpu mode relies on
[perf_events](https://man7.org/linux/man-pages/man2/perf_event_open.2.html).
The idea is the same - to generate a signal every N nanoseconds of CPU time,
which in this case is achieved by configuring PMU to generate an interrupt
every K CPU cycles.
+- **WALL**: Same as CPU sampling, but also samples threads in non-runnable
state, such as threads in sleep
+- **ITIMER**: itimer mode is based on
[setitimer(ITIMER_PROF)](https://man7.org/linux/man-pages/man2/setitimer.2.html)
syscall, which ideally generates a signal every given interval of the CPU time
consumed by the process.
+- **CTIMER**: ctimer aims to address these limitations of
[perf_events](https://man7.org/linux/man-pages/man2/perf_event_open.2.html) and
itimer. ctimer relies on
[timer_create](https://man7.org/linux/man-pages/man2/timer_create.2.html). It
combines benefits of cpu and itimer, except that it does not allow collecting
kernel stacks.
+
+For details, please refer to
[async-profiler](https://github.com/async-profiler/async-profiler/blob/master/docs/CpuSamplingEngines.md)
+
+### ExecArgs in task creation
+
+By default, task parameters are separated by commas. When creating a task,
users should refer to the following example format for input:
`lock=10us,interval=10ms`.
+
+Currently, the following parameters are supported by default:
+
+| Option | Description
|
+| :---------------- |
:--------------------------------------------------------- |
+| chunksize=N | approximate size of JFR chunk in bytes (default: 100 MB)
|
+| chunktime=N | duration of JFR chunk in seconds (default: 1 hour)
|
+| lock\[=DURATION\] | profile contended locks overflowing the DURATION ns
bucket |
+| jstackdepth=N | maximum Java stack depth (default: 2048\)
|
+| interval=N | sampling interval in ns (default: 10'000'000, i.e. 10
ms) |
+| alloc\[=BYTES\] | profile allocations with BYTES interval
|
+
+For other parameters, please refer to
[async-profiler](https://github.com/async-profiler/async-profiler/blob/master/src/arguments.cpp#L44)
and need to be tested by yourself
+
+### Comparison table between sampling types and JFR events in task analysis
+
+| Task sample type | JFR event type
| Description
| Unit |
+| :------------------------------------------ |
:----------------------------------------------------------- |
:----------------------------------------------------------- |
:----------------------------------------------------------- |
+| CPU<br />WALL<br /><br />ITIMER<br />CTIMER | EXECUTION\_SAMPLE
| Multiple **AsyncProfilerEventType** types
correspond to the **EXECUTION_SAMPLE** event. This is primarily due to the fact
that different sampling types employ distinct underlying mechanisms and have
varying sampling scopes. | Sample times. <br />The execution time can be
calculated based on the sampling interval. For instance, if the number of
samples is 10 and the interval is s [...]
+| LOCK | THREAD\_PARK<br
/>JAVA\_MONITOR\_ENTER | Empty
| ns
|
+| ALLOC |
OBJECT\_ALLOCATION\_IN\_NEW\_TLAB<br />OBJECT\_ALLOCATION\_OUTSIDE\_TLAB |
Empty | byte
|
+| Add `live` option to extended parameters | PROFILER\_LIVE\_OBJECT
| Because it is not in the event parameter of
async-profiler, it is not selected separately in the task sampling type of the
UI during implementation, but is used as an extended parameter | byte
|
+
+### performance expenses
+
+**There is no performance overhead when an instance is not receiving an
async-profiler task.** Performance impact is only introduced once the
async-profiler performance analysis is initiated. The extent of this overhead
depends on the specific configuration parameters. When using the default
settings, the performance impact typically ranges from 0.3% to 10%. For more
detailed information, please refer to the
[issue](https://github.com/async-profiler/async-profiler/issues/14).
\ No newline at end of file
diff --git a/content/blog/2024-12-09-skywalking-async-profiler/performance.jpg
b/content/blog/2024-12-09-skywalking-async-profiler/performance.jpg
new file mode 100644
index 00000000000..3a4e126a1af
Binary files /dev/null and
b/content/blog/2024-12-09-skywalking-async-profiler/performance.jpg differ
diff --git a/content/blog/2024-12-09-skywalking-async-profiler/progress.jpg
b/content/blog/2024-12-09-skywalking-async-profiler/progress.jpg
new file mode 100644
index 00000000000..12caccc510a
Binary files /dev/null and
b/content/blog/2024-12-09-skywalking-async-profiler/progress.jpg differ
diff --git a/content/zh/2024-12-09-skywalking-async-profiler/arch.jpg
b/content/zh/2024-12-09-skywalking-async-profiler/arch.jpg
new file mode 100644
index 00000000000..5868674fd4e
Binary files /dev/null and
b/content/zh/2024-12-09-skywalking-async-profiler/arch.jpg differ
diff --git a/content/zh/2024-12-09-skywalking-async-profiler/create_task.jpg
b/content/zh/2024-12-09-skywalking-async-profiler/create_task.jpg
new file mode 100644
index 00000000000..1ac737b95e7
Binary files /dev/null and
b/content/zh/2024-12-09-skywalking-async-profiler/create_task.jpg differ
diff --git a/content/zh/2024-12-09-skywalking-async-profiler/facade.jpg
b/content/zh/2024-12-09-skywalking-async-profiler/facade.jpg
new file mode 100644
index 00000000000..b3bdc6c1321
Binary files /dev/null and
b/content/zh/2024-12-09-skywalking-async-profiler/facade.jpg differ
diff --git a/content/zh/2024-12-09-skywalking-async-profiler/index.md
b/content/zh/2024-12-09-skywalking-async-profiler/index.md
new file mode 100644
index 00000000000..ef8139c314d
--- /dev/null
+++ b/content/zh/2024-12-09-skywalking-async-profiler/index.md
@@ -0,0 +1,122 @@
+---
+title: "使用 SkyWalking中的 async-profiler 对 Java 应用进行性能剖析"
+date: 2024-12-09
+author: "zhengziyi0117"
+description: "本文展示了 SkyWalking 中 async-profiler 的介绍和用法"
+---
+
+## 背景
+
+[Apache SkyWalking](https://skywalking.apache.org/)
是一个开源的应用性能管理系统,帮助用户从各种平台收集日志、跟踪、指标和事件,并在用户界面上展示它们。在10.1.0版本中,Apache SkyWalking
可以通过 eBPF 进行 CPU 分析,eBPF 支持多种语言,但并不支持 Java。本文探讨了Apache SkyWalking 10.2.0版本如何采用
async-profiler 来收集 CPU、内存分配、锁并进行分析,解决了这一限制,同时额外提供了内存分配以及占用分析。
+
+## 为什么使用 async-profiler?
+
+async-profiler 是一个用于 Java
的低开销采样分析器,它不会受到[安全点偏差问题](http://psy-lob-saw.blogspot.ru/2016/02/why-most-sampling-java-profilers-are.html)的影响。它基于
HotSpot 特定的 API来收集堆栈并跟踪内存分配。该分析器可与 OpenJDK 和其他基于 HotSpot JVM 的 Java
运行时一起使用。async-profiler 同时支持官方支持 Linux、mac 平台常用的指令集架构,并且采样数据支持使用 JFR 格式存储,相比于
JDK 官方提供提供的 JFR 工具支持更低的 JDK 版本(JDK 6)。
+
+
+
+### 一次任务的流程
+
+1. 用户在 UI 中下发 async-profiler 任务
+2. Java agent 从 OAP Server 获取任务
+3. Java agent 执行任务,通过 async-profiler 进行数据采样,将采样的数据写入 JFR 文件中
+4. 采样指定时间后,Java agent 上传 JFR 文件至 OAP Server
+5. OAP Server 对 JFR 文件进行解析,并且记录相关实例已经完成
+6. 用户通过UI选择完成任务的实例进行性能分析
+
+## 演示
+
+您可以在本地部署 SkyWalking Showcase 来预览此功能。在此演示中,我们仅部署服务、最新发布的 SkyWalking OAP 和 UI。
+
+```sh
+export FEATURE_FLAGS=java-agent-injector,single-node,elasticsearch
+make deploy.kubernetes
+```
+
+部署完成后,请运行以下脚本以打开 SkyWalking UI:http://localhost:8080/ 。
+
+```sh
+kubectl port-forward svc/ui 8080:8080 --namespace default
+```
+
+### 使用流程
+
+部署完成后,用户可以点击进入配置了 Java agent 的 Service 页面。进入该服务页面后,用户将能够看到 **Async Profiling**
组件,点击该组件即可访问相关功能页面并进行操作。
+
+
+
+### 任务下发
+
+在 Async Profiling 页面选择**新建任务**将会显示如下页面,下面是参数的使用说明:
+
+- **实例**:可执行性能剖析的实例,支持选择多个实例同时进行分析。
+- **持续时间**:任务的执行时长(默认设置为最多 20 分钟,参数较为保守,可通过 Java agent 中的
[agent.config](https://github.com/apache/skywalking-java/blob/7e200bbbb052f0e03e5b2db09e1b0a4c6cf1d71c/apm-sniffer/config/agent.config#L170)
进行配置调整)。
+- **分析事件**:分析事件可以大致分为三种类型采样:
+ - **CPU采样**:包含 CPU、WALL、CTIMER、ITIMER。有关四种 CPU
采样类型的区别可以参考[下文](#任务创建中不同CPU采样的区别)
+ - **内存分配采样**:ALLOC
+ - **锁占用采样**:LOCK
+- 任务扩展参数: async-profiler 的扩展参数,具体使用说明请参考[下文](#任务创建中的扩展参数)
+
+
+
+### 任务进度展示
+
+点击任务详情图标后,用户可以查看**任务的状态日志、相关参数以及已失败、成功完成数据采集的实例**。成功完成采集的实例将可用于后续的性能分析。
+
+> 值得注意的是,考虑到在容器部署中用户并未设置卷挂载时,可能会存在无法接收 JFR 文件的情况,因此 OAP 默认使用内存接收 JFR
并且解析,并且设置的可接受 JFR 文件大小比较保守(默认为30MB)。
+>
+> 用户可以自行在 OAP 中设置 JFR 默认大小以及先存储到文件系统再解析,以接收更大的 JFR 文件和更平滑的内存分配。
+>
+> 目前的 JFR
解析器在解析200MB的JFR文件大概会带来1GB左右的内存分配(**注意只是内存分配,而不是需要1GB内存才能解析**),用户可以根据这个作为参考。
+
+
+
+### 性能分析
+
+用户可以点击任务,选择需要进行性能分析的实例(支持选择多实例,汇总生成火焰图分析结果)。然后选择分析的 JFR 事件类型,点击 **分析**
按钮即可生成并显示相应的火焰图
+
+
+
+## 一些细节
+
+### 任务创建中不同CPU采样的区别
+
+CPU采样有以下几种: CPU、WALL、CTIMER、ITIMER,本质为 async-profiler 实现的采样引擎不同,下面详细介绍不同采样的差别:
+
+- CPU: 基于
[perf_events](https://man7.org/linux/man-pages/man2/perf_event_open.2.html)。每 N
纳秒的 CPU 时间生成一个信号,在这种情况下,通过配置 PMU 每 K CPU 周期生成一个中断来实现
+- WALL: 与 CPU采样 相同,但同时会采集非 runnable 状态的线程,例如会采集正在 sleep 的线程
+- ITIMER: 基于
[setitimer](https://man7.org/linux/man-pages/man2/setitimer.2.html)
系统调用,理想情况下会在进程消耗的 CPU 时间的每个给定间隔生成一个信号。
+- CTIMER: 基于
[timer_create](https://man7.org/linux/man-pages/man2/timer_create.2.html) 系统调用.
它结合了 CPU和 ITIMER 的优点,但它不允许收集内核堆栈
+
+详情可以参考
[async-profiler](https://github.com/async-profiler/async-profiler/blob/master/docs/CpuSamplingEngines.md)
官方文档
+
+## 任务创建中的扩展参数
+
+默认情况下,任务参数使用逗号分隔。在创建任务时,用户可以参考以下示例格式进行填写:`lock=10us,interval=10ms`。
+
+目前官方默认支持以下参数:
+
+| 选项 | 含义 |
+| --------------- |
------------------------------------------------------------ |
+| chunksize=N | JFR分chunk的大小(默认: 100 MB) |
+| chunktime=N | JFR分chunk的时间(默认: 1 hour) |
+| lock[=DURATION] | 在锁分析模式下,当总锁持续时间溢出阈值时,对争用锁进行采样 (默认: 10us) |
+| jstackdepth=N | 采样时采集java最大栈深度(默认: 2048) |
+| interval=N | CPU采样间隔 单位ns (默认: 10'000'000, 即10 ms) |
+| alloc[=BYTES] | 内存分配采样间隔,以字节单位 |
+
+其余参数可以参考
[async-profiler](https://github.com/async-profiler/async-profiler/blob/master/src/arguments.cpp#L44)
自行实验测试
+
+### 任务分析中采样类型与 JFR 事件对照表
+
+| 任务采样类型 | JFR事件
| 备注 | 单位
|
+| :------------------------ |
:----------------------------------------------------------- |
:----------------------------------------------------------- |
:----------------------------------------------------------- |
+| CPU、WALL、CTIMER、ITIMER | EXECUTION\_SAMPLE
| 多种 **AsyncProfilerEventType** 类型都对应于 **EXECUTION_SAMPLE**
事件,主要原因在于不同类型的采样类型采用了不同的原理,并且采样的范围有所不同。 | 采样次数<br
/>执行时间可以通过interval计算,例如采样次数为10次,interval为10ms,则可以认为执行了100ms(默认interval为10ms) |
+| LOCK | THREAD\_PARK、JAVA\_MONITOR\_ENTER
| 无 | ns
|
+| ALLOC |
OBJECT\_ALLOCATION\_IN\_NEW\_TLAB、OBJECT\_ALLOCATION\_OUTSIDE\_TLAB | 无
| byte
|
+| 扩展参数中添加live选项 | PROFILER\_LIVE\_OBJECT
| 因为不在 async-profiler 的 event 参数里面,所以实现时没有单独拿出来在 UI 的任务采样类型中选择,而是作为扩展参数使用 |
byte |
+
+### 性能开销
+
+**在实例未接收到 async-profiler 任务时,不会产生性能开销;仅在启动 async-profiler 性能分析后,才会引入相应的性能损耗。**
性能损耗的具体程度会根据配置的参数有所不同。使用默认参数时,性能损耗大约在 0.3% 到 10% 之间。更多详细信息可参考
[issue](https://github.com/async-profiler/async-profiler/issues/14)。
+
diff --git a/content/zh/2024-12-09-skywalking-async-profiler/performance.jpg
b/content/zh/2024-12-09-skywalking-async-profiler/performance.jpg
new file mode 100644
index 00000000000..97369b40dc1
Binary files /dev/null and
b/content/zh/2024-12-09-skywalking-async-profiler/performance.jpg differ
diff --git a/content/zh/2024-12-09-skywalking-async-profiler/progress.jpg
b/content/zh/2024-12-09-skywalking-async-profiler/progress.jpg
new file mode 100644
index 00000000000..6a497d5b58e
Binary files /dev/null and
b/content/zh/2024-12-09-skywalking-async-profiler/progress.jpg differ