This is an automated email from the ASF dual-hosted git repository.
yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 8cfd0230fd3 [opt] memory debug tools (#2305)
8cfd0230fd3 is described below
commit 8cfd0230fd340bdde68fbe4fba49600a94db15b5
Author: Xinyi Zou <[email protected]>
AuthorDate: Wed Apr 23 23:16:04 2025 +0800
[opt] memory debug tools (#2305)
## Versions
- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
community/developer-guide/debug-tool.md | 98 ++++++++++++++++++----
.../current/developer-guide/debug-tool.md | 98 ++++++++++++++++++----
2 files changed, 165 insertions(+), 31 deletions(-)
diff --git a/community/developer-guide/debug-tool.md
b/community/developer-guide/debug-tool.md
index 079c7b53d02..d4fbd355fc8 100644
--- a/community/developer-guide/debug-tool.md
+++ b/community/developer-guide/debug-tool.md
@@ -222,25 +222,28 @@ The output of this command is the same as the output and
view mode of heap profi
For the analysis of the principle of Heap Profile, please refer to [Analysis
of the Principle of Heap
Profiling](https://cn.pingcap.com/blog/an-explanation-of-the-heap-profiling-principle/).
It should be noted that Heap Profile records virtual memory
-It supports real-time and regular dumping of Heap Profile, and then uses
`jeprof` to analyze Heap Profile.
+Supports both real-time and periodic Heap Dump, and then uses `jeprof` to
parse the generated Heap Profile.
###### 1. realtime heap dump
-Change `prof:false` of `JEMALLOC_CONF` in `be.conf` to `prof:true` and restart
BE, then use the jemalloc heap dump http interface to generate a heap dump file
on the corresponding BE machine.
+Change `prof:false` in `JEMALLOC_CONF` in `be.conf` to `prof:true`, change
`prof_active:false` to `prof_active:true` and restart Doris BE, then use the
Jemalloc Heap Dump HTTP interface to generate a Heap Profile file on the
corresponding BE machine.
+
+> For Doris 2.1.8 and 3.0.4 and later versions, `prof` in `JEMALLOC_CONF` is
already `true` by default, no need to modify.
+> For Doris versions before 2.1.8 and 3.0.4, there is no `prof_active` in
`JEMALLOC_CONF`, just change `prof:false` to `prof:true`.
```shell
curl http://be_host:be_webport/jeheap/dump
```
-The directory where the heap dump file is located can be configured through
the ``jeprofile_dir`` variable in ``be.conf``, and the default is
``${DORIS_HOME}/log``
+The directory where the Heap Profile file is located can be configured in
`be.conf` through the `jeprofile_dir` variable, which defaults to
`${DORIS_HOME}/log`
-The default sampling interval is 512K, usually only 10% of memory is recorded
by heap dump, and the impact on performance is usually less than 10%. You can
modify `lg_prof_sample` of `JEMALLOC_CONF` in `be.conf`, and the default is
`19` (2^19 B = 512K), reducing `lg_prof_sample` can sample more frequently to
make the heap profile close to the real memory, but this will bring greater
performance loss.
+The default sampling interval is 512K, which usually only records 10% of the
memory, and the impact on performance is usually less than 10%. You can modify
`lg_prof_sample` in `JEMALLOC_CONF` in `be.conf`, which defaults to `19` (2^19
B = 512K). Reducing `lg_prof_sample` can sample more frequently to make the
Heap Profile closer to the real memory, but this will bring greater performance
loss.
-If you are doing profiling, keep `prof:false` to avoid the performance penalty
of heap dump.
+If you are doing performance testing, keep `prof:false` to avoid the
performance loss of Heap Dump.
###### 2. regular heap dump
-First, change `prof:false` of `JEMALLOC_CONF` in `be.conf` to `prof:true`. The
default directory of the heap dump file is `${DORIS_HOME}/log`. The file name
prefix is `JEMALLOC_PROF_PRFIX` in `be.conf`, which is
`jemalloc_heap_profile_` by default.
+First, change `prof:false` of `JEMALLOC_CONF` in `be.conf` to `prof:true`. The
directory where the Heap Profile file is located defaults to
`${DORIS_HOME}/log`, and the file name prefix is `JEMALLOC_PROF_PRFIX` in
`be.conf`, which defaults to `jemalloc_heap_profile_`.
> Before Doris 2.1.6, `JEMALLOC_PROF_PRFIX` is empty and needs to be changed
> to any value as the profile file name
@@ -265,14 +268,14 @@ Use `jeprof --alloc_space` to display the cumulative
value of heap dump.
##### 3. `jeprof` parses Heap Profile
-Use `jeprof` to parse the Heap Profile of the above dump. If the process
memory is too large, the parsing process may take several minutes, so please
wait patiently. If the system does not have the `jeprof` command, you can
package the `jeprof` binary in the `doris/tools` directory and upload it to the
Heap Dump server.
+Use `be/bin/jeprof` to parse the Heap Profile of the above dump. If the
process memory is too large, the parsing process may take several minutes,
please wait patiently.
-```
-Addr2line version 2.35.2 or above is required, see QA-1 below
-Try to have Heap Dump and `jeprof` to parse Heap Profile on the same server,
see QA-2 below
-```
+If there is no `jeprof` binary in the `be/bin` directory of the Doris BE
deployment path, you can package the `jeprof` in the `doris/tools` directory
and upload it to the server.
+
+> Requires addr2line version 2.35.2 and above, see QA-1 below for details
+> Try to have Heap Dump and `jeprof` analyze Heap Profile on the same server,
that is, analyze Heap Profile directly on the machine running Doris BE, see
QA-2 below for details
-1. Analyze a single heap dump file
+1. Analyze a single Heap Profile file
```shell
jeprof --dot lib/doris_be heap_dump_file_1
@@ -287,7 +290,7 @@ yum install ghostscript graphviz
jeprof --pdf lib/doris_be heap_dump_file_1 > result.pdf
```
-2. Analyze the diff of two heap dump files
+2. Analyze the diff of two Heap Profile files
```shell
jeprof --dot lib/doris_be --base=heap_dump_file_1 heap_dump_file_2
@@ -332,7 +335,74 @@ Note that you cannot use addr2line 2.3.9, which may be
incompatible and cause me
2. After running `jeprof`, many errors appear: `addr2line: DWARF error:
invalid or unhandled FORM value: 0x25`. The parsed Heap stack is the memory
address of the code, not the function name
-This is because the Heap Dump and the execution of `jeprof` to parse the Heap
Profile are not on the same server, which causes `jeprof` to fail to parse the
function name using the symbol table. Try to complete the Dump Heap and
`jeprof` parsing operations on the same machine.
+Usually, it is because the execution of Heap Dump and the execution of
`jeprof` to parse Heap Profile are not on the same server, which causes
`jeprof` to fail to parse the function name using the symbol table. Try to
complete the operation of Dump Heap and `jeprof` parsing on the same machine,
that is, parse the Heap Profile directly on the machine running Doris BE as
much as possible.
+
+Or confirm the Linux kernel version of the machine running Doris BE, download
the `be/bin/doris_be` binary file and the Heap Profile file to the machine with
the same kernel version to execute `jeprof`.
+
+3. If the Heap stack after directly parsing the Heap Profile on the machine
running Doris BE is still the memory address of the code, not the function name
+
+Use the following script to manually parse the Heap Profile and modify these
variables:
+
+- heap: the file name of the Heap Profile.
+- bin: `be/bin/doris_be` binary file name
+- llvm_symbolizer: Path to the llvm symbol table parser, preferably the
version used to compile the `be/bin/doris_be` binary.
+
+```
+#!/bin/bash
+## @brief
+## @author zhoufei
+## @email [email protected]
+## @date 2024-02-24-Sat
+
+# 1. jeprof --dot ${bin} ${heap} > heap.dot to generate calling profile
+# 2. find base addr and symbol
+# 3. get addr to symble table with llvm-symbolizer
+# 4. replace the addr with symbol
+
+# heap file name
+heap=jeheap_dump.1708694081.3443.945778264.heap
+# binary name
+bin=doris_be_aws.3.0.5
+# path to llvm symbolizer
+llvm_symbolizer=$HOME/opt/ldb-toolchain-16/bin/llvm-symbolizer
+# output file name
+out=out.dot
+vaddr_baddr_symbol=vaddr_baddr_symbol.txt
+program_name=doris_be
+
+jeprof --dot ${bin} ${heap} > ${out}
+
+baseaddr=$(grep ${program_name} ${heap} | head -n 1 | awk -F'-' '{print $1}')
+echo "$baseaddr: ${baseaddr}"
+
+function find_symbol() {
+ local addr="$1"
+ "${llvm_symbolizer}" --inlining --obj=${bin} ${addr} | head -n 1 | awk -F'('
'{print $1}'
+}
+
+if [ -f ${vaddr_baddr_symbol} ]; then
+ cat ${vaddr_baddr_symbol} | while read vaddr baddr; do
+ symbol=$(find_symbol ${baddr})
+ echo "${vaddr} ${baddr} ${symbol}"
+ sed -ri.orig "s/${vaddr}/${symbol}/g" ${out}
+ done
+else # recalculate the addr and
+ grep -oP '0x(\d|[a-f])+' ${out} | xargs -I {} python -c "print('{}',
'0x{:x}'.format({} - 0x${baseaddr}))" \
+ | while read vaddr baddr; do
+ symbol=$(find_symbol ${baddr})
+ echo "${vaddr} ${baddr} ${symbol}"
+ sed -ri.orig "s/${vaddr}/${symbol}/g" ${out}
+ done | tee ${vaddr_baddr_symbol}
+fi
+
+# vim: et tw=80 ts=2 sw=2 cc=80:
+```
+
+4. If all the above methods do not work
+
+- Try to recompile the `be/bin/doris_be` binary on the machine running Doris
BE, that is, compile, run, and `jeprof` analyze on the same machine.
+
+- After the above operation, if the Heap stack is still the memory address of
the code, try `USE_JEMALLOC=OFF ./build.sh --be` to compile Doris BE using
TCMalloc, and then refer to the above section to use TCMalloc Heap Profile to
analyze memory.
#### LSAN
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
index fa6d027bb79..26e9018daeb 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
@@ -227,25 +227,28 @@ Total: 1296.4 MB
有关 Heap Profile 的原理解析参考 [Heap Profiling
原理解析](https://cn.pingcap.com/blog/an-explanation-of-the-heap-profiling-principle/),需要注意的是
Heap Profile 记录的是虚拟内存
-支持实时和定期两种方式 Dump Heap Profile,然后使用 `jeprof` 解析 Heap Profile。
+支持实时和定期两种方式 Heap Dump,然后使用 `jeprof` 解析生成的 Heap Profile。
###### 1. 实时 Heap Dump
-将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`
并重启BE,然后使用jemalloc heap dump http接口,在对应的BE机器上生成heap dump文件。
+将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`,将
`prof_active:false` 修改为 `prof_active:true` 并重启 Doris BE,然后使用 Jemalloc Heap Dump
HTTP 接口,在对应的BE机器上生成 Heap Profile 文件。
+
+> Doris 2.1.8 和 3.0.4 及之后的版本,`JEMALLOC_CONF` 中 `prof` 已经默认为 `true`,无需修改。
+> Doris 2.1.8 和 3.0.4 之前的版本, `JEMALLOC_CONF` 中没有 `prof_active`,只需将
`prof:false` 修改为 `prof:true` 即可。
```shell
curl http://be_host:be_webport/jeheap/dump
```
-heap dump文件所在目录可以在 ``be.conf``
中通过``jeprofile_dir``变量进行配置,默认为``${DORIS_HOME}/log``
+Heap Profile 文件所在目录可以在 `be.conf` 中通过 `jeprofile_dir` 变量进行配置,默认为
`${DORIS_HOME}/log`
-默认采样间隔为 512K,这通常只会有 10% 的内存被heap dump记录,对性能的影响通常小于 10%,可以修改 `be.conf` 中
`JEMALLOC_CONF` 的 `lg_prof_sample`,默认为 `19` (2^19 B = 512K),减小 `lg_prof_sample`
可以更频繁的采样使 heap profile 接近真实内存,但这会带来更大的性能损耗。
+默认采样间隔为 512K,这通常只会有 10% 的内存被记录,对性能的影响通常小于 10%,可以修改 `be.conf` 中 `JEMALLOC_CONF`
的 `lg_prof_sample`,默认为 `19` (2^19 B = 512K),减小 `lg_prof_sample` 可以更频繁的采样使 Heap
Profile 接近真实内存,但这会带来更大的性能损耗。
-如果你在做性能测试,保持 `prof:false` 来避免 heap dump 的性能损耗。
+如果你在做性能测试,保持 `prof:false` 来避免 Heap Dump 的性能损耗。
###### 2. 定期 Heap Dump
-首先将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`,heap
dump文件所在目录默认为 `${DORIS_HOME}/log`, 文件名前缀是 `be.conf` 中的
`JEMALLOC_PROF_PRFIX`,默认是 `jemalloc_heap_profile_`。
+首先将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`,Heap Profile
文件所在目录默认为 `${DORIS_HOME}/log`, 文件名前缀是 `be.conf` 中的 `JEMALLOC_PROF_PRFIX`,默认是
`jemalloc_heap_profile_`。
> 在 Doris 2.1.6 之前,`JEMALLOC_PROF_PRFIX` 为空,需要修改为任意值作为 profile 文件名
@@ -270,14 +273,14 @@ heap dump文件所在目录可以在 ``be.conf`` 中通过``jeprofile_dir``变
##### 3. `jeprof` 解析 Heap Profile
-使用 `jeprof` 解析上面 Dump 的 Heap Profile,如果进程内存太大,解析过程可能需要几分钟,请耐心等待。若系统没有 `jeprof`
命令,可以将 `doris/tools` 目录下的 `jeprof` 这个二进制打包后上传到 Heap Dump 的服务器。
+使用 `be/bin/jeprof` 解析上面 Dump 的 Heap Profile,如果进程内存太大,解析过程可能需要几分钟,请耐心等待。
-```
-需要 addr2line 版本为 2.35.2 及以上, 见下面的 QA-1
-尽可能让 Heap Dump 和执行 `jeprof` 解析 Heap Profile 在同一台服务器上,见下面的 QA-2
-```
+若 Doris BE 部署路径的 `be/bin` 目录下没有 `jeprof` 这个二进制,可以将 `doris/tools` 目录下的 `jeprof`
打包后上传到服务器。
-1. 分析单个 Heap Dump 文件
+> 需要 addr2line 版本为 2.35.2 及以上, 详情见下面的 QA-1
+> 尽可能让执行 Heap Dump 和执行 `jeprof` 解析 Heap Profile 在同一台服务器上,即尽可能在运行 Doris BE
的机器上直接解析 Heap Profile,详情见下面的 QA-2
+
+1. 分析单个 Heap Profile 文件
```shell
jeprof --dot lib/doris_be heap_dump_file_1
@@ -292,7 +295,7 @@ heap dump文件所在目录可以在 ``be.conf`` 中通过``jeprofile_dir``变
jeprof --pdf lib/doris_be heap_dump_file_1 > result.pdf
```
-2. 分析两个 Heap Dump 文件的diff
+2. 分析两个 Heap Profile 文件的diff
```shell
jeprof --dot lib/doris_be --base=heap_dump_file_1 heap_dump_file_2
@@ -336,13 +339,74 @@ hash -r
2. 运行 `jeprof` 后出现很多错误: `addr2line: DWARF error: invalid or unhandled FORM
value: 0x25`,解析后的 Heap 栈都是代码的内存地址,而不是函数名称
-这是因为 Heap Dump 和执行 `jeprof` 解析 Heap Profile 不在同一台服务器上,导致 `jeprof`
使用符号表解析函数名称失败,尽可能在同一台机器上完成 Dump Heap 和 `jeprof` 解析的操作。
+通常是因为执行 Heap Dump 和执行 `jeprof` 解析 Heap Profile 不在同一台服务器上,导致 `jeprof`
使用符号表解析函数名称失败,尽可能在同一台机器上完成 Dump Heap 和 `jeprof` 解析的操作,,即尽可能在运行 Doris BE
的机器上直接解析 Heap Profile。
+
+或者确认下运行 Doris BE 的机器 Linux 内核版本,将 `be/bin/doris_be` 二进制文件和 Heap Profile
文件下载到相同内核版本的机器上执行 `jeprof`。
+
+3. 如果在运行 Doris BE 的机器上直接解析 Heap Profile 后的 Heap 栈依然是代码的内存地址,而不是函数名称
+
+使用下面的脚本,手动解析 Heap Profile,修改这几个变量:
+
+- heap: Heap Profile 的文件名。
+- bin: `be/bin/doris_be` 二进制文件名
+- llvm_symbolizer: llvm 符号表解析程序的路径,版本最好是编译 `be/bin/doris_be` 二进制使用的版本。
+
+```
+#!/bin/bash
+## @brief
+## @author zhoufei
+## @email [email protected]
+## @date 2024-02-24-Sat
+
+# 1. jeprof --dot ${bin} ${heap} > heap.dot to generate calling profile
+# 2. find base addr and symbol
+# 3. get addr to symble table with llvm-symbolizer
+# 4. replace the addr with symbol
+
+# heap file name
+heap=jeheap_dump.1708694081.3443.945778264.heap
+# binary name
+bin=doris_be_aws.3.0.5
+# path to llvm symbolizer
+llvm_symbolizer=$HOME/opt/ldb-toolchain-16/bin/llvm-symbolizer
+# output file name
+out=out.dot
+vaddr_baddr_symbol=vaddr_baddr_symbol.txt
+program_name=doris_be
+
+jeprof --dot ${bin} ${heap} > ${out}
+
+baseaddr=$(grep ${program_name} ${heap} | head -n 1 | awk -F'-' '{print $1}')
+echo "$baseaddr: ${baseaddr}"
+
+function find_symbol() {
+ local addr="$1"
+ "${llvm_symbolizer}" --inlining --obj=${bin} ${addr} | head -n 1 | awk -F'('
'{print $1}'
+}
+
+if [ -f ${vaddr_baddr_symbol} ]; then
+ cat ${vaddr_baddr_symbol} | while read vaddr baddr; do
+ symbol=$(find_symbol ${baddr})
+ echo "${vaddr} ${baddr} ${symbol}"
+ sed -ri.orig "s/${vaddr}/${symbol}/g" ${out}
+ done
+else # recalculate the addr and
+ grep -oP '0x(\d|[a-f])+' ${out} | xargs -I {} python -c "print('{}',
'0x{:x}'.format({} - 0x${baseaddr}))" \
+ | while read vaddr baddr; do
+ symbol=$(find_symbol ${baddr})
+ echo "${vaddr} ${baddr} ${symbol}"
+ sed -ri.orig "s/${vaddr}/${symbol}/g" ${out}
+ done | tee ${vaddr_baddr_symbol}
+fi
+
+# vim: et tw=80 ts=2 sw=2 cc=80:
+```
-3. 如果 Heap Dump 和执行 `jeprof` 解析 Heap Profile 在同一台服务器上,但解析后的 Heap
栈依然是代码的内存地址,而不是函数名称
+4. 如果上面所有的方法都不行
-尝试在 Heap Dump 的机器上重新编译 Doris BE,也就是让编译和运行 Doris BE 在一台机器上,并在这台机器上 Heap Dump 和
`jeprof` 解析。
+- 尝试在运行 Doris BE 的机器上重新编译 `be/bin/doris_be` 二进制,也就是让编译、运行、`jeprof` 解析在同一台机器上。
-上面的操作后,如果 Heap 栈依然是代码的内存地址,尝试 `USE_JEMALLOC=OFF ./build.sh --be` 编译使用 TCMalloc
的 Doris BE,然后参考上面的章节使用 TCMalloc Heap Profile 分析内存。
+- 上面的操作后,如果 Heap 栈依然是代码的内存地址,尝试 `USE_JEMALLOC=OFF ./build.sh --be` 编译使用
TCMalloc 的 Doris BE,然后参考上面的章节使用 TCMalloc Heap Profile 分析内存。
#### LSAN
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]