This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 71b80dd261a [opt] opt debug tool doc (#3305)
71b80dd261a is described below
commit 71b80dd261a9487d116f58a4abbefcc09ed2d9f1
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Mon Jan 26 20:14:11 2026 +0800
[opt] opt debug tool doc (#3305)
---
community/developer-guide/debug-tool.md | 406 ++++++++++----------
.../current/developer-guide/debug-tool.md | 409 +++++++++++----------
2 files changed, 434 insertions(+), 381 deletions(-)
diff --git a/community/developer-guide/debug-tool.md
b/community/developer-guide/debug-tool.md
index 8e221abc967..1be773ac77c 100644
--- a/community/developer-guide/debug-tool.md
+++ b/community/developer-guide/debug-tool.md
@@ -1,7 +1,8 @@
---
{
- "title": "Debug Tool",
- "language": "en"
+ "title": "Debugging Tools",
+ "language": "en",
+ "description": "A comprehensive guide to debugging tools and methods for
Apache Doris, including FE and BE debugging techniques, memory profiling with
Jemalloc and TCMalloc, memory leak detection with LSAN and ASAN, and CPU
profiling with pprof and perf."
}
---
@@ -24,185 +25,192 @@ specific language governing permissions and limitations
under the License.
-->
-# Debug Tool
+# Debugging Tools
-In the process of using and developing Doris, we often encounter scenarios
that need to debug Doris. Here are some common debugging tools.
+During Doris usage and development, debugging is often necessary. This
document introduces commonly used debugging tools and methods.
-**The name of the BE binary that appears in this doc is `doris_be`, which was
`palo_be` in previous versions.**
+**Note: The BE binary file name `doris_be` mentioned in this document was
`palo_be` in earlier versions.**
-## FE debugging
+## FE Debugging
-Fe is a java process. Here are just a few simple and commonly used java
debugging commands.
+FE is a Java process. Below are some commonly used Java debugging commands.
-1. Statistics of current memory usage details
+### 1. Memory Usage Statistics
- ```
- jmap -histo:live pid > 1. jmp
- ```
+```bash
+jmap -histo:live pid > 1.jmp
+```
- This command can enumerate and sort the memory occupation of living
objects. (replace PID with Fe process ID)
+This command lists the memory usage of live objects sorted by size (replace
pid with the FE process ID).
- ```
- num #instances #bytes class name
- ----------------------------------------------
- 1: 33528 10822024 [B
- 2: 80106 8662200 [C
- 3: 143 4688112 [Ljava.util.concurrent.ForkJoinTask;
- 4: 80563 1933512 java. lang.String
- 5: 15295 1714968 java. lang.Class
- 6: 45546 1457472 java. util. concurrent.
ConcurrentHashMap$Node
- 7: 15483 1057416 [Ljava.lang.Object;
- ```
+```text
+ num #instances #bytes class name
+----------------------------------------------
+ 1: 33528 10822024 [B
+ 2: 80106 8662200 [C
+ 3: 143 4688112 [Ljava.util.concurrent.ForkJoinTask;
+ 4: 80563 1933512 java.lang.String
+ 5: 15295 1714968 java.lang.Class
+ 6: 45546 1457472 java.util.concurrent.ConcurrentHashMap$Node
+ 7: 15483 1057416 [Ljava.lang.Object;
+```
- You can use this method to view the total memory occupied by the currently
living objects (at the end of the file) and analyze which objects occupy more
memory.
+This method allows you to view the total memory occupied by live objects (at
the end of the file) and analyze which objects consume more memory.
- Note that this method will trigger fullgc because `: live 'is specified.
+**Note:** This method triggers a FullGC due to the `:live` parameter.
-2. Check JVM memory usage
+### 2. JVM Memory Usage
- ```
- jstat -gcutil pid 1000 1000
- ```
+```bash
+jstat -gcutil pid 1000 1000
+```
- This command can scroll through the memory usage of each region of the
current JVM. (replace PID with Fe process ID)
+This command checks JVM memory usage in each region every second (replace pid
with the FE process ID).
- ```
- S0 S1 E O M CCS YGC YGCT FGC FGCT
GCT
- 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- ```
+```text
+ S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
+ 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+```
- The main focus is on the percentage of old area (o) (3% in the example).
If the occupancy is too high, oom or fullgc may occur.
+Focus on the Old generation (O) percentage (3.03% in the example). High usage
may lead to OOM or FullGC.
-3. Print Fe thread stack
+### 3. Print FE Thread Stack
- ```
- jstack -l pid > 1. js
- ```
+```bash
+jstack -l pid > 1.js
+```
- This command can print the thread stack of the current Fe. (replace PID
with Fe process ID).
- `-L ` the parameter will detect whether there is deadlock at the same
time. This method can check the operation of Fe thread, whether there is
deadlock, where it is stuck, etc.
+This command prints the current FE thread stack (replace pid with the FE
process ID).
-## BE debugging
+The `-l` parameter also detects deadlocks. This method can be used to view FE
thread execution status, detect deadlocks, and locate blocking positions.
-### Memory
+## BE Debugging
-Debugging memory is generally divided into two aspects. One is whether the
total amount of memory use is reasonable. On the one hand, the excessive amount
of memory use may be due to memory leak in the system, on the other hand, it
may be due to improper use of program memory. The second is whether there is a
problem of memory overrun and illegal access, such as program access to memory
with an illegal address, use of uninitialized memory, etc. For the debugging of
memory, we usually use [...]
+### Memory Debugging
-#### Jemalloc HEAP PROFILE
+Memory debugging focuses on two aspects:
-> Doris 1.2.2 version starts to use Jemalloc as the memory allocator by
default.
+1. **Memory usage reasonability**: Excessive memory usage may indicate memory
leaks or improper memory usage.
+2. **Memory access legality**: Detecting memory overflows, illegal access,
accessing invalid addresses, or using uninitialized memory.
-For the principle analysis of Heap Profile, refer to [Heap Profiling Principle
Analysis](https://cn.pingcap.com/blog/an-explanation-of-the-heap-profiling-principle/).
It should be noted that Heap Profile records virtual memory
+The following tools can be used for tracking and analysis.
-Supports real-time and periodic Heap Dump, and then uses `jeprof` to parse the
generated Heap Profile.
+#### Jemalloc Heap Profile
-##### 1. Real-time Heap Dump, used to analyze real-time memory
+> **Note:** Doris 1.2.2 and later versions use Jemalloc as the default memory
allocator.
-Change `prof:false` in `JEMALLOC_CONF` in `be.conf` to `prof:true`, change
`prof_active:false` to `prof_active:true` and restart Doris BE, then use the
Jemalloc Heap Dump HTTP interface to generate a Heap Profile file on the
corresponding BE machine.
+For Heap Profiling principles, refer to [Heap Profiling Principle
Explanation](https://cn.pingcap.com/blog/an-explanation-of-the-heap-profiling-principle/).
Note that Heap Profile records virtual memory.
-> For Doris 2.1.8 and 3.0.4 and later versions, `prof` in `JEMALLOC_CONF` is
already `true` by default, no need to modify.
+Jemalloc supports both real-time and periodic Heap Dump methods, then uses the
`jeprof` tool to parse the generated Heap Profile.
-For Doris versions before 2.1.8 and 3.0.4, there is no `prof_active` in
`JEMALLOC_CONF`, just change `prof:false` to `prof:true`.
+##### 1. Real-time Heap Dump (for analyzing real-time memory)
-```shell
+In `be.conf`, change `prof:false` to `prof:true` and `prof_active:false` to
`prof_active:true` in `JEMALLOC_CONF`, then restart Doris BE. Use the Jemalloc
Heap Dump HTTP interface to generate Heap Profile files on the BE machine.
+
+> **Version Notes:**
+> - Doris 2.1.8, 3.0.4 and later: `prof` is already `true` by default in
`JEMALLOC_CONF`, no modification needed.
+> - Before Doris 2.1.8 and 3.0.4: `JEMALLOC_CONF` doesn't have `prof_active`
option, just change `prof:false` to `prof:true`.
+
+```bash
curl http://be_host:be_webport/jeheap/dump
```
-The directory where the Heap Profile file is located can be configured in
`be.conf` through the `jeprofile_dir` variable, which defaults to
`${DORIS_HOME}/log`
+**Configuration:**
-The default sampling interval is 512K, which usually only records 10% of the
memory, and the impact on performance is usually less than 10%. You can modify
`lg_prof_sample` in `JEMALLOC_CONF` in `be.conf`, which defaults to `19` (2^19
B = 512K). Reducing `lg_prof_sample` can sample more frequently to make the
Heap Profile closer to the real memory, but this will bring greater performance
loss.
+- **Heap Profile directory**: Configure via `jeprofile_dir` in `be.conf`,
defaults to `${DORIS_HOME}/log`.
+- **Sampling interval**: Defaults to 512KB, typically recording ~10% of memory
with <10% performance impact. Modify `lg_prof_sample` in `JEMALLOC_CONF`
(default `19`, i.e., 2^19 B = 512KB). Decreasing `lg_prof_sample` increases
sampling frequency for more accurate profiles but higher overhead.
-If you are doing performance testing, keep `prof:false` to avoid the
performance loss of Heap Dump.
+**Performance tip:** Keep `prof:false` during performance testing to avoid
Heap Dump overhead.
-##### 2. Regular Heap Dump for long-term memory observation
+##### 2. Periodic Heap Dump (for long-term memory observation)
-Change `prof:false` of `JEMALLOC_CONF` in `be.conf` to `prof:true`. The
directory where the Heap Profile file is located is `${DORIS_HOME}/log` by
default. The file name prefix is `JEMALLOC_PROF_PRFIX` in `be.conf`, and the
default is `jemalloc_heap_profile_`.
+Change `prof:false` to `prof:true` in `JEMALLOC_CONF` in `be.conf`. Heap
Profile files default to `${DORIS_HOME}/log` with prefix specified by
`JEMALLOC_PROF_PRFIX` (default `jemalloc_heap_profile_`).
-> Before Doris 2.1.6, `JEMALLOC_PROF_PRFIX` is empty and needs to be changed
to any value as the profile file name
+> **Note:** Before Doris 2.1.6, `JEMALLOC_PROF_PRFIX` was empty and needs to
be set.
-1. Dump when the cumulative memory application reaches a certain value:
+**Dump triggers:**
-Change `lg_prof_interval` of `JEMALLOC_CONF` in `be.conf` to 34. At this time,
the profile is dumped once when the cumulative memory application reaches 16GB
(2^35 B = 16GB). You can change it to any value to adjust the dump interval.
+1. **Dump after cumulative memory allocation**
-> Before Doris 2.1.6, `lg_prof_interval` defaults to 32.
+ Change `lg_prof_interval` to `34` in `JEMALLOC_CONF` to dump after
cumulative 16GB allocation (2^34 B = 16GB).
-2. Dump every time the memory reaches a new high:
+ > **Note:** Before Doris 2.1.6, `lg_prof_interval` defaulted to `32`.
-Change `prof_gdump` in `JEMALLOC_CONF` in `be.conf` to `true` and restart BE.
+2. **Dump on memory peak**
-3. Dump when the program exits, and detect memory leaks:
+ Change `prof_gdump` to `true` in `JEMALLOC_CONF` and restart BE.
-Change `prof_leak` and `prof_final` in `JEMALLOC_CONF` in `be.conf` to `true`
and restart BE.
+3. **Dump on exit and detect leaks**
-4. Dump the cumulative value (growth) of memory instead of the real-time value:
+ Change `prof_leak` and `prof_final` to `true` in `JEMALLOC_CONF` and
restart BE.
-Change `prof_accum` in `JEMALLOC_CONF` in `be.conf` to `true` and restart BE.
+4. **Dump cumulative (growth) instead of real-time values**
-Use `jeprof --alloc_space` to display the cumulative value of heap dump.
+ Change `prof_accum` to `true` in `JEMALLOC_CONF` and restart BE. Use
`jeprof --alloc_space` to display cumulative heap dump.
-##### 3. `jeprof` parses Heap Profile
+##### 3. Parse Heap Profile with `jeprof`
-Use `be/bin/jeprof` to parse the Heap Profile of the above dump. If the
process memory is too large, the parsing process may take several minutes.
Please wait patiently.
+Use `be/bin/jeprof` to parse dumped Heap Profiles. Parsing may take minutes
for large memory processes.
-If there is no `jeprof` binary in the `be/bin` directory of the Doris BE
deployment path, you can package the `jeprof` in the `doris/tools` directory
and upload it to the server.
+If `jeprof` binary is missing from `be/bin`, upload `jeprof` from
`doris/tools` directory.
-> The addr2line version is required to be 2.35.2 or above, see QA-1 below for
details
-> Try to have Heap Dump and `jeprof` analyze Heap Profile on the same server,
that is, analyze Heap Profile directly on the machine running Doris BE as much
as possible, see QA-2 below for details
+> **Notes:**
+> - Requires addr2line version 2.35.2+, see QA-1 below.
+> - Execute Heap Dump and `jeprof` parsing on the same machine running Doris
BE, see QA-2 below.
-1. Analyze a single Heap Profile file
+**1. Analyze single Heap Profile**
-```shell
+```bash
jeprof --dot ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file
```
-After executing the above command, paste the text output by the terminal to
the [online dot drawing website](http://www.webgraphviz.com/) to generate a
memory allocation graph, and then analyze it.
+Paste terminal output to [online dot
visualization](http://www.webgraphviz.com/) to generate memory allocation
diagram.
-If the server is convenient for file transfer, you can also use the following
command to directly generate a call relationship graph. The result.pdf file is
transferred to the local computer for viewing. You need to install the
dependencies required for drawing.
+To generate PDF directly (requires dependencies):
-```shell
+```bash
yum install ghostscript graphviz
jeprof --pdf ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file >
result.pdf
```
-[graphviz](http://www.graphviz.org/): Without this library, pprof can only be
converted to text format, but this method is not easy to view. After installing
this library, pprof can be converted to svg, pdf and other formats, and the
call relationship is clearer.
+**2. Analyze diff between two Heap Profiles**
-2. Analyze the diff of two heap profile files
-
-```shell
+```bash
jeprof --dot ${DORIS_HOME}/lib/doris_be --base=${DORIS_HOME}/log/profile_file
${DORIS_HOME}/log/profile_file2
```
-Multiple heap files can be generated by running the above command multiple
times over a period of time. You can select an earlier heap file as a baseline
and compare and analyze their diff with a later heap file. The method for
generating a call graph is the same as above.
+Compare heap files from different times to analyze diff by using earlier file
as baseline.
-##### 4. QA
+##### 4. Common Issues (QA)
-1. Many errors appear after running jeprof: `addr2line: Dwarf Error: found
dwarf version xxx, this reader only handles version xxx`.
+**QA-1: Errors after running jeprof: `addr2line: Dwarf Error: found dwarf
version xxx, this reader only handles version xxx`**
-GCC 11 and later use DWARF-v5 by default, which requires Binutils 2.35.2 and
above. Doris Ldb_toolchain uses GCC 11. See:
https://gcc.gnu.org/gcc-11/changes.html.
+GCC 11+ defaults to DWARF-v5, requiring Binutils 2.35.2+. Doris Ldb_toolchain
uses GCC 11.
-Replace addr2line to 2.35.2, refer to:
-```
-// Download addr2line source code
+Solution: Upgrade addr2line to 2.35.2.
+
+```bash
+# Download addr2line source
wget https://ftp.gnu.org/gnu/binutils/binutils-2.35.tar.bz2
-// Install dependencies, if necessary
+# Install dependencies if needed
yum install make gcc gcc-c++ binutils
-// Compile & install addr2line
+# Compile & install addr2line
tar -xvf binutils-2.35.tar.bz2
cd binutils-2.35
./configure --prefix=/usr/local
make
make install
-// Verify
+# Verify
addr2line -h
-// Replace addr2line
+# Replace addr2line
chmod +x addr2line
mv /usr/bin/addr2line /usr/bin/addr2line.bak
mv /bin/addr2line /bin/addr2line.bak
@@ -210,25 +218,26 @@ cp addr2line /bin/addr2line
cp addr2line /usr/bin/addr2line
hash -r
```
-Note that addr2line 2.3.9 cannot be used, which may be incompatible and cause
the memory to keep growing.
-
-2. Many errors appear after running `jeprof`: `addr2line: DWARF error: invalid
or unhandled FORM value: 0x25`, and the parsed Heap stack is the memory address
of the code, not the function name
-Usually, it is because the execution of Heap Dump and the execution of
`jeprof` to parse Heap Profile are not on the same server, which causes
`jeprof` to fail to parse the function name using the symbol table. Try to
complete the operation of Dump Heap and `jeprof` parsing on the same machine,
that is, try to parse the Heap Profile directly on the machine running Doris BE.
+**Note:** Don't use addr2line 2.3.9, which may be incompatible and cause
memory growth.
-Or confirm the Linux kernel version of the machine running Doris BE, download
the `be/bin/doris_be` binary file and the Heap Profile file to the machine with
the same kernel version and execute `jeprof`.
+**QA-2: Errors after running `jeprof`: `addr2line: DWARF error: invalid or
unhandled FORM value: 0x25`, parsed heap stacks show memory addresses instead
of function names**
-3. If the Heap stack after directly parsing the Heap Profile on the machine
running Doris BE is still the memory address of the code, not the function name
+Usually occurs when Heap Dump and `jeprof` parsing are on different servers,
causing symbol table resolution failure.
-Use the following script to manually parse the Heap Profile and modify these
variables:
+Solution:
+- Execute Dump Heap and `jeprof` parsing on the same machine running Doris BE.
+- Or download `be/bin/doris_be` binary and Heap Profile to a machine with
matching Linux kernel version and run `jeprof`.
-- heap: the file name of the Heap Profile.
+**QA-3: If heap stacks still show memory addresses instead of function names
after parsing on the BE machine**
-- bin: the file name of the `be/bin/doris_be` binary
+Use this script for manual parsing. Modify these variables:
-- llvm_symbolizer: the path of the llvm symbol table parser, the version
should preferably be the version used to compile the `be/bin/doris_be` binary.
+- `heap`: Heap Profile filename.
+- `bin`: `be/bin/doris_be` binary filename.
+- `llvm_symbolizer`: Path to llvm symbolizer, preferably the version used to
compile the binary.
-```
+```bash
#!/bin/bash
## @brief
## @author zhoufei
@@ -279,27 +288,30 @@ fi
# vim: et tw=80 ts=2 sw=2 cc=80:
```
-4. If all the above methods do not work
-
-- Try to recompile the `be/bin/doris_be` binary on the machine running Doris
BE, that is, compile, run, and `jeprof` analyze on the same machine.
+**QA-4: If none of the above methods work**
-- After the above operation, if the Heap stack is still the memory address of
the code, try `USE_JEMALLOC=OFF ./build.sh --be` to compile Doris BE using
TCMalloc, and then refer to the above section to use TCMalloc Heap Profile to
analyze memory.
+- Try recompiling `be/bin/doris_be` on the BE machine to compile, run, and
parse on the same machine.
+- If heap stacks still show addresses, try compiling with TCMalloc using
`USE_JEMALLOC=OFF ./build.sh --be`, then use TCMalloc Heap Profile as described
below.
-#### TCMalloc HEAP PROFILE
+#### TCMalloc Heap Profile
-> Doris 1.2.1 and earlier versions use TCMalloc. Doris 1.2.2 version uses
Jemalloc by default. To switch to TCMalloc, you can compile like this:
`USE_JEMALLOC=OFF sh build.sh --be`.
+> **Note:** Doris 1.2.1 and earlier use TCMalloc. Doris 1.2.2+ default to
Jemalloc. To switch back to TCMalloc, compile with `USE_JEMALLOC=OFF sh
build.sh --be`.
-When using TCMalloc, when a large memory application is encountered, the
application stack will be printed to the be.out file, and the general
expression is as follows:
+When using TCMalloc, large memory allocations print stacks to `be.out`:
-```
+```text
tcmalloc: large alloc 1396277248 bytes == 0x3f3488000 @ 0x2af6f63 0x2c4095b
0x134d278 0x134bdcb 0x133d105 0x133d1d0 0x19930ed
```
-This indicates that Doris be is trying to apply memory of '1396277248 bytes'
on this stack. We can use the 'addr2line' command to restore the stack to a
letter that we can understand. The specific example is shown below.
+This indicates Doris BE attempted to allocate `1396277248 bytes` at this
stack. Use `addr2line` to convert to readable information:
+```bash
+addr2line -e lib/doris_be 0x2af6f63 0x2c4095b 0x134d278 0x134bdcb 0x133d105
0x133d1d0 0x19930ed
```
-$ addr2line -e lib/doris_be 0x2af6f63 0x2c4095b 0x134d278 0x134bdcb 0x133d105
0x133d1d0 0x19930ed
+Output example:
+
+```text
/home/ssd0/zc/palo/doris/core/thirdparty/src/gperftools-gperftools-2.7/src/tcmalloc.cc:1335
/home/ssd0/zc/palo/doris/core/thirdparty/src/gperftools-gperftools-2.7/src/tcmalloc.cc:1357
/home/disk0/baidu-doris/baidu/bdg/doris-baidu/core/be/src/exec/hash_table.cpp:267
@@ -309,20 +321,24 @@ $ addr2line -e lib/doris_be 0x2af6f63 0x2c4095b
0x134d278 0x134bdcb 0x133d105 0
thread.cpp:?
```
-Sometimes the application of memory is not caused by the application of large
memory, but by the continuous accumulation of small memory. Then there is no
way to locate the specific application information by viewing the log, so you
need to get the information through other ways.
+Sometimes memory issues come from accumulating small allocations, not visible
in logs. Use TCMalloc's [HEAP
PROFILE](https://gperftools.github.io/gperftools/heapprofile.html) feature. Set
`HEAPPROFILE` environment variable before starting Doris BE:
-At this time, we can take advantage of TCMalloc's
[heapprofile](https://gperftools.github.io/gperftools/heapprofile.html). If the
heapprofile function is set, we can get the overall memory application usage of
the process. The usage is to set the 'heapprofile' environment variable before
starting Doris be. For example:
-
-```
-export HEAPPROFILE=/tmp/doris_be.hprof
+```bash
+export TCMALLOC_SAMPLE_PARAMETER=64000 HEAP_PROFILE_ALLOCATION_INTERVAL=-1
HEAP_PROFILE_INUSE_INTERVAL=-1 HEAP_PROFILE_TIME_INTERVAL=5
HEAPPROFILE=/tmp/doris_be.hprof
./bin/start_be.sh --daemon
```
-In this way, when the dump condition of the heapprofile is met, the overall
memory usage will be written to the file in the specified path. Later, we can
use the 'pprof' tool to analyze the output content.
+> **Note:** HEAPPROFILE requires absolute path, and directory must exist.
+When HEAPPROFILE dump conditions are met, memory usage writes to specified
file. Use `pprof` tool to analyze output.
+
+```bash
+pprof --text lib/doris_be /tmp/doris_be.hprof.0012.heap | head -30
```
-$ pprof --text lib/doris_be /tmp/doris_be.hprof.0012.heap | head -30
+Output example:
+
+```text
Using local file lib/doris_be.
Using local file /tmp/doris_be.hprof.0012.heap.
Total: 668.6 MB
@@ -339,30 +355,35 @@ Total: 668.6 MB
1.7 0.3% 98.4% 1.7 0.3% doris::SegmentReader::_load_index
```
-Contents of each column of the above documents:
+**Column meanings:**
-* Column 1: the memory size directly applied by the function, in MB
-* Column 4: the total memory size of the function and all the functions it
calls.
-* The second column and the fifth column are the proportion values of the
first column and the fourth column respectively.
-* The third column is the cumulative value of the second column.
+- **Column 1**: Memory directly allocated by function (MB).
+- **Column 2**: Percentage of column 1.
+- **Column 3**: Cumulative value of column 2.
+- **Column 4**: Total memory occupied by function and all called functions
(MB).
+- **Column 5**: Percentage of column 4.
-Of course, it can also generate call relation pictures, which is more
convenient for analysis. For example, the following command can generate a call
graph in SVG format.
+Generate call relationship graph in SVG format:
-```
-pprof --svg lib/doris_be /tmp/doris_be.hprof.0012.heap > heap.svg
+```bash
+pprof --svg lib/doris_be /tmp/doris_be.hprof.0012.heap > heap.svg
```
-**NOTE: turning on this option will affect the execution performance of the
program. Please be careful to turn on the online instance.**
+**Performance tip:** This option affects performance. Use cautiously on
production instances.
-##### pprof remote server
+##### pprof Remote Server
-Although heapprofile can get all the memory usage information, it has some
limitations. 1. Restart be. 2. You need to enable this command all the time,
which will affect the performance of the whole process.
+HEAP PROFILE has limitations: 1. Requires BE restart; 2. Continuous enabling
impacts performance.
-For Doris be, you can also use the way of opening and closing the heap profile
dynamically to analyze the memory application of the process. Doris supports
the [remote server debugging of
gperftools](https://gperftools.github.io/gperftools/pprof_remote_servers.html).
Then you can use 'pprof' to directly perform dynamic head profile on the remote
running Doris be. For example, we can check the memory usage increment of Doris
through the following command
+Doris BE supports dynamic heap profiling. Doris supports GPerftools [remote
server
debugging](https://gperftools.github.io/gperftools/pprof_remote_servers.html).
Use `pprof` to dynamically profile remote running Doris BE. Example for viewing
memory usage increment:
+```bash
+pprof --text --seconds=60 http://be_host:be_webport/pprof/heap
```
-$ pprof --text --seconds=60 http://be_host:be_webport/pprof/heap
+Output example:
+
+```text
Total: 1296.4 MB
484.9 37.4% 37.4% 484.9 37.4% doris::StorageByteBuffer::create
272.2 21.0% 58.4% 273.3 21.1% doris::RowBlock::init
@@ -379,27 +400,27 @@ Total: 1296.4 MB
10.0 0.8% 93.4% 10.0 0.8%
doris::PlainTextLineReader::PlainTextLineReader
```
-The output of this command is the same as the output and view mode of heap
profile, which will not be described in detail here. Statistics will be enabled
only during execution of this command, which has a limited impact on process
performance compared with heap profile.
+Output and viewing method match HEAP PROFILE. This command only enables
statistics during execution, causing less performance impact than HEAP PROFILE.
-#### LSAN
+#### LSAN (Memory Leak Detection)
-[LSAN](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)
is an address checking tool, GCC has been integrated. When we compile the
code, we can enable this function by turning on the corresponding compilation
options. When the program has a determinable memory leak, it prints the leak
stack. Doris be has integrated this tool, only need to compile with the
following command to generate be binary with memory leak detection version.
+[LSAN](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)
is an address checking tool integrated in GCC. Enable during compilation to
activate this feature. When determinable memory leaks occur, leak stacks are
printed. Doris BE has integrated this tool. Compile with:
-```
+```bash
BUILD_TYPE=LSAN ./build.sh
```
-When the system detects a memory leak, it will output the corresponding
information in be. Out. For the following demonstration, we intentionally
insert a memory leak code into the code. We insert the following code into the
`open` function of `StorageEngine`.
+When memory leaks are detected, corresponding information outputs to `be.out`.
For demonstration, we intentionally inject a memory leak in the `StorageEngine`
`open` function:
-```
- char* leak_buf = new char[1024];
- strcpy(leak_buf, "hello world");
- LOG(INFO) << leak_buf;
+```cpp
+char* leak_buf = new char[1024];
+strcpy(leak_buf, "hello world");
+LOG(INFO) << leak_buf;
```
-We get the following output in be.out
+Then `be.out` shows:
-```
+```text
=================================================================
==24732==ERROR: LeakSanitizer: detected memory leaks
@@ -412,33 +433,33 @@ Direct leak of 1024 byte(s) in 1 object(s) allocated from:
SUMMARY: LeakSanitizer: 1024 byte(s) leaked in 1 allocation(s).
```
-From the above output, we can see that 1024 bytes have been leaked, and the
stack information of memory application has been printed out.
+Output shows 1024 bytes leaked with memory allocation stack trace.
-**NOTE: turning on this option will affect the execution performance of the
program. Please be careful to turn on the online instance.**
+**Performance tip:** This option affects performance. Use cautiously on
production instances.
-**NOTE: if the LSAN switch is turned on, the TCMalloc will be automatically
turned off**
+**Note:** Enabling LSAN automatically disables TCMalloc.
-#### ASAN
+#### ASAN (Address Legality Detection)
-Except for the unreasonable use and leakage of memory. Sometimes there will be
memory access illegal address and other errors. At this time, we can use
[ASAN](https://github.com/google/sanitizers/wiki/addresssanitizer) to help us
find the cause of the problem. Like LSAN, ASAN is integrated into GCC. Doris
can open this function by compiling as follows
+Besides improper memory usage and leaks, illegal address access errors can
occur. Use [ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer)
to find root causes. Like LSAN, ASAN is integrated in GCC. Compile Doris with:
-```
+```bash
BUILD_TYPE=ASAN ./build.sh
```
-Execute the binary generated by compilation. When the detection tool finds any
abnormal access, it will immediately exit and output the stack illegally
accessed in be.out. The output of ASAN is the same as that of LSAN. Here we
also actively inject an address access error to show the specific content
output. We still inject an illegal memory access into the 'open' function of
'storageengine'. The specific error code is as follows
+When abnormal access is detected, the binary exits immediately and outputs
illegal access stack to `be.out`. ASAN output analysis uses the same method as
LSAN. For demonstration, inject an address access error in the `StorageEngine`
`open` function:
-```
- char* invalid_buf = new char[1024];
- for (int i = 0; i < 1025; ++i) {
- invalid_buf[i] = i;
- }
- LOG(INFO) << invalid_buf;
+```cpp
+char* invalid_buf = new char[1024];
+for (int i = 0; i < 1025; ++i) {
+ invalid_buf[i] = i;
+}
+LOG(INFO) << invalid_buf;
```
-We get the following output in be.out
+Then `be.out` shows:
-```
+```text
=================================================================
==23284==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x61900008bf80 at pc 0x00000129f56a bp 0x7fff546eed90 sp 0x7fff546eed88
WRITE of size 1 at 0x61900008bf80 thread T0
@@ -447,7 +468,7 @@ WRITE of size 1 at 0x61900008bf80 thread T0
#2 0x7fa5580fbbd4 in __libc_start_main
(/opt/compiler/gcc-4.8.2/lib64/libc.so.6+0x21bd4)
#3 0xd30794
(/home/ssd0/zc/palo/doris/core/output3/be/lib/doris_be+0xd30794)
-0x61900008bf80 is located 0 bytes to the right of 1024-byte region
[0x61900008bb80,0x61900008bf80]
+0x61900008bf80 is located 0 bytes to the right of 1024-byte region
[0x61900008bb80,0x61900008bf80)
allocated by thread T0 here:
#0 0xdeb040 in operator new[](unsigned long)
../../../../gcc-7.3.0/libsanitizer/asan/asan_new_delete.cc:82
#1 0x129f50d in doris::StorageEngine::open(doris::EngineOptions const&,
doris::StorageEngine**)
/home/ssd0/zc/palo/doris/core/be/src/olap/storage_engine.cpp:104
@@ -457,66 +478,69 @@ allocated by thread T0 here:
SUMMARY: AddressSanitizer: heap-buffer-overflow
/home/ssd0/zc/palo/doris/core/be/src/olap/storage_engine.cpp:106 in
doris::StorageEngine::open(doris::EngineOptions const&, doris::StorageEngine**)
```
-From this message, we can see that at the address of `0x61900008bf80`, we
tried to write a byte, but this address is illegal. We can also see the
application stack of the address `[0x61900008bb80, 0x61900008bf80]`.
+This shows an attempted one-byte write to illegal address `0x61900008bf80`,
and the allocation stack for region `[0x61900008bb80,0x61900008bf80)`.
-**NOTE: turning on this option will affect the execution performance of the
program. Please be careful to turn on the online instance.**
+**Performance tip:** This option affects performance. Use cautiously on
production instances.
-**NOTE: if the ASAN switch is turned on, the TCMalloc will be automatically
turned off**
+**Note:** Enabling ASAN automatically disables TCMalloc.
-In addition, if stack information is output in be.out, but there is no
function symbol, then we need to handle it manually to get readable stack
information. The specific processing method needs a script to parse the output
of ASAN. At this time, we need to use
[asan_symbolize](https://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/asan/scripts/asan_symbolize.py)
to help with parsing. The specific usage is as follows:
+If `be.out` stack output lacks function symbols, manual processing is needed.
Use the
[asan_symbolize](https://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/asan/scripts/asan_symbolize.py)
script to parse ASAN output:
-```
+```bash
cat be.out | python asan_symbolize.py | c++filt
```
-With the above command, we can get readable stack information.
+This command produces readable stack information.
-### CPU
+### CPU Debugging
-When the CPU idle of the system is very low, it means that the CPU of the
system has become the main bottleneck. At this time, it is necessary to analyze
the current CPU usage. For the be of Doris, there are two ways to analyze the
CPU bottleneck of Doris.
+When system CPU Idle is low, CPU is the main bottleneck. Analyze current CPU
usage. For Doris BE, there are two methods to analyze CPU bottlenecks.
#### pprof
-[pprof](https://github.com/google/pprof): from gperftools, it is used to
transform the content generated by gperftools into a format that is easy for
people to read, such as PDF, SVG, text, etc.
+[pprof](https://github.com/google/pprof) from gperftools converts gperftools
output to readable formats like PDF, SVG, Text.
-Because Doris has integrated and compatible with GPERF rest interface, users
can analyze remote Doris be through the 'pprof' tool. The specific usage is as
follows:
+Since Doris has integrated and is compatible with GPerf REST interface, use
`pprof` tool to analyze remote Doris BE:
-```
-pprof --svg --seconds=60 http://be_host:be_webport/pprof/profile > be.svg
+```bash
+pprof --svg --seconds=60 http://be_host:be_webport/pprof/profile > be.svg
```
-In this way, a CPU consumption graph of be execution can be generated.
+This command generates a BE CPU consumption graph.

-#### perf + flamegragh
+#### perf + FlameGraph
-This is a quite common CPU analysis method. Compared with `pprof`, this method
must be able to log in to the physical machine of the analysis object. However,
compared with pprof, which can only collect points on time, perf can collect
stack information through different events. The specific usage is as follows:
+This is a very general CPU analysis method. Unlike `pprof`, this method
requires login to the physical machine. But compared to pprof's timed sampling,
perf can collect stack information through different events.
-[perf](https://perf.wiki.kernel.org/index.php/main_page): Linux kernel comes
with performance analysis tool. [here](http://www.brendangregg.com/perf.html)
there are some examples of perf usage.
+**Tool introduction:**
-[flamegraph](https://github.com/brendangregg/flamegraph): a visualization tool
used to show the output of perf in the form of flame graph.
+- [perf](https://perf.wiki.kernel.org/index.php/Main_Page): Linux kernel
built-in performance analysis tool.
[Here](http://www.brendangregg.com/perf.html) are some perf usage examples.
+- [FlameGraph](https://github.com/brendangregg/FlameGraph): Visualization tool
to display perf output as flame graphs.
-```
+**Usage:**
+
+```bash
perf record -g -p be_pid -- sleep 60
```
-This command counts the CPU operation of be for 60 seconds and generates
perf.data. For the analysis of perf.data, the command of perf can be used for
analysis.
+This command profiles BE CPU usage for 60 seconds and generates `perf.data`
file. Analyze `perf.data` with perf command:
-```
+```bash
perf report
```
-The analysis results in the following pictures
+Analysis example:

-To analyze the generated content. Of course, you can also use flash graph to
complete the visual display.
+Or visualize with FlameGraph:
-```
+```bash
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl
> be.svg
```
-This will also generate a graph of CPU consumption at that time.
+This also generates a CPU consumption graph.
-
+
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
index 4060bc44ed1..0082e9583ed 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/debug-tool.md
@@ -1,7 +1,8 @@
---
{
"title": "调试工具",
- "language": "zh-CN"
+ "language": "zh-CN",
+ "description": "介绍 Apache Doris 的常用调试工具和方法,包括 FE 和 BE
的调试技巧,如内存分析、线程分析、性能监控等实用调试手段。"
}
---
@@ -26,182 +27,192 @@ under the License.
# 调试工具
-在Doris的使用、开发过程中,经常会遇到需要对Doris进行调试的场景,这里介绍一些常用的调试工具。
+在 Doris 的使用和开发过程中,经常需要对 Doris 进行调试。本文档介绍了一些常用的调试工具和方法。
-**文中的出现的BE二进制文件名称 `doris_be`,在之前的版本中为 `palo_be`。**
+**注意:文中出现的 BE 二进制文件名称 `doris_be` 在早期版本中为 `palo_be`。**
## FE 调试
-FE 是 Java 进程。这里只列举一下简单常用的 java 调试命令。
+FE 是 Java 进程,以下列举一些常用的 Java 调试命令。
-1. 统计当前内存使用明细
+### 1. 统计当前内存使用明细
- ```
- jmap -histo:live pid > 1.jmp
- ```
+```bash
+jmap -histo:live pid > 1.jmp
+```
- 该命令可以列举存活的对象的内存占用并排序。(pid 换成 FE 进程 id)
+该命令可以列举存活对象的内存占用情况并排序(将 pid 替换为 FE 进程 ID)。
- ```
- num #instances #bytes class name
- ----------------------------------------------
- 1: 33528 10822024 [B
- 2: 80106 8662200 [C
- 3: 143 4688112 [Ljava.util.concurrent.ForkJoinTask;
- 4: 80563 1933512 java.lang.String
- 5: 15295 1714968 java.lang.Class
- 6: 45546 1457472
java.util.concurrent.ConcurrentHashMap$Node
- 7: 15483 1057416 [Ljava.lang.Object;
- ```
+```text
+ num #instances #bytes class name
+----------------------------------------------
+ 1: 33528 10822024 [B
+ 2: 80106 8662200 [C
+ 3: 143 4688112 [Ljava.util.concurrent.ForkJoinTask;
+ 4: 80563 1933512 java.lang.String
+ 5: 15295 1714968 java.lang.Class
+ 6: 45546 1457472 java.util.concurrent.ConcurrentHashMap$Node
+ 7: 15483 1057416 [Ljava.lang.Object;
+```
- 可以通过这个方法查看目前存活对象占用的总内存(在文件最后),以及分析哪些对象占用了更多的内存。
+通过该方法可以查看当前存活对象占用的总内存(在文件末尾),以及分析哪些对象占用了更多的内存。
- 注意,这个方法因指定了 `:live`,因此会触发 FullGC。
+**注意:** 该方法因指定了 `:live` 参数,会触发 FullGC。
-2. 查看 JVM 内存使用
+### 2. 查看 JVM 内存使用
- ```
- jstat -gcutil pid 1000 1000
- ```
+```bash
+jstat -gcutil pid 1000 1000
+```
- 该命令可以滚动查看当前 JVM 各区域的内存使用情况。(pid 换成 FE 进程 id)
+该命令可以每隔 1 秒查看一次当前 JVM 各区域的内存使用情况(将 pid 替换为 FE 进程 ID)。
- ```
- S0 S1 E O M CCS YGC YGCT FGC FGCT
GCT
- 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
- ```
+```text
+ S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
+ 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.61 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+ 0.00 0.00 22.92 3.03 95.74 92.77 68 1.249 5 0.794
2.043
+```
- 其中主要关注 Old区(O)的占用百分比(如示例中为 3%)。如果占用过高,则可能出现 OOM 或 FullGC。
+重点关注 Old 区(O)的占用百分比(如示例中为 3.03%)。如果占用过高,则可能出现 OOM 或 FullGC。
-3. 打印 FE 线程堆栈
+### 3. 打印 FE 线程堆栈
- ```
- jstack -l pid > 1.js
- ```
+```bash
+jstack -l pid > 1.js
+```
- 该命令可以打印当前 FE 的线程堆栈。(pid 换成 FE 进程 id)。
+该命令可以打印当前 FE 的线程堆栈(将 pid 替换为 FE 进程 ID)。
- `-l` 参数会同时检测是否有死锁。该方法可以查看 FE 线程运行情况,是否有死锁,哪里卡住了等问题。
+`-l` 参数会同时检测是否存在死锁。该方法可用于查看 FE 线程运行情况、是否存在死锁、定位阻塞位置等问题。
## BE 调试
-### 内存
+### 内存调试
-对于内存的调试一般分为两个方面。一个是内存使用的总量是否合理,内存使用量过大一方面可能是由于系统存在内存泄露,另一方面可能是因为程序内存使用不当。其次就是是否存在内存越界、非法访问的问题,比如程序访问一个非法地址的内存,使用了未初始化内存等。对于内存方面的调试我们一般使用如下几种方式来进行问题追踪。
+内存调试主要关注两个方面:
-#### Jemalloc HEAP PROFILE
+1. **内存使用量是否合理**:内存使用量过大可能是系统存在内存泄漏,或程序内存使用不当。
+2. **内存访问是否合法**:是否存在内存越界、非法访问等问题,例如访问非法地址或使用未初始化的内存。
-> Doris 1.2.2 版本开始默认使用 Jemalloc 作为内存分配器.
+针对这些问题,可以使用以下工具进行追踪和分析。
-有关 Heap Profile 的原理解析参考 [Heap Profiling
原理解析](https://cn.pingcap.com/blog/an-explanation-of-the-heap-profiling-principle/),需要注意的是
Heap Profile 记录的是虚拟内存
+#### Jemalloc Heap Profile
-支持实时和定期两种方式 Heap Dump,然后使用 `jeprof` 解析生成的 Heap Profile。
+> **说明:** Doris 1.2.2 版本开始默认使用 Jemalloc 作为内存分配器。
-##### 1. 实时 Heap Dump,用于分析实时内存
+Heap Profile 的原理解析可参考 [Heap Profiling
原理解析](https://cn.pingcap.com/blog/an-explanation-of-the-heap-profiling-principle/)。需要注意的是,Heap
Profile 记录的是虚拟内存。
-将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`,将
`prof_active:false` 修改为 `prof_active:true` 并重启 Doris BE,然后使用 Jemalloc Heap Dump
HTTP 接口,在对应的BE机器上生成 Heap Profile 文件。
+Jemalloc 支持实时和定期两种 Heap Dump 方式,然后使用 `jeprof` 工具解析生成的 Heap Profile。
-> Doris 2.1.8 和 3.0.4 及之后的版本,`JEMALLOC_CONF` 中 `prof` 已经默认为 `true`,无需修改。
-> Doris 2.1.8 和 3.0.4 之前的版本, `JEMALLOC_CONF` 中没有 `prof_active`,只需将
`prof:false` 修改为 `prof:true` 即可。
+##### 1. 实时 Heap Dump(用于分析实时内存)
-```shell
+将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`,将
`prof_active:false` 修改为 `prof_active:true`,然后重启 Doris BE。之后使用 Jemalloc Heap
Dump HTTP 接口在 BE 机器上生成 Heap Profile 文件。
+
+> **版本说明:**
+> - Doris 2.1.8 和 3.0.4 及之后的版本:`JEMALLOC_CONF` 中 `prof` 已默认为 `true`,无需修改。
+> - Doris 2.1.8 和 3.0.4 之前的版本:`JEMALLOC_CONF` 中没有 `prof_active` 选项,只需将
`prof:false` 修改为 `prof:true` 即可。
+
+```bash
curl http://be_host:be_webport/jeheap/dump
```
-Heap Profile 文件所在目录可以在 `be.conf` 中通过 `jeprofile_dir` 变量进行配置,默认为
`${DORIS_HOME}/log`
+**配置说明:**
+
+- **Heap Profile 文件目录**:可在 `be.conf` 中通过 `jeprofile_dir` 变量配置,默认为
`${DORIS_HOME}/log`。
+- **采样间隔**:默认为 512KB,通常只记录约 10% 的内存,性能影响通常小于 10%。可以修改 `be.conf` 中
`JEMALLOC_CONF` 的 `lg_prof_sample` 参数(默认为 `19`,即 2^19 B = 512KB)。减小
`lg_prof_sample` 可以更频繁采样,使 Heap Profile 更接近真实内存,但会带来更大的性能损耗。
+
+**性能提示:** 如果在做性能测试,建议保持 `prof:false` 以避免 Heap Dump 的性能开销。
-默认采样间隔为 512K,这通常只会有 10% 的内存被记录,对性能的影响通常小于 10%,可以修改 `be.conf` 中 `JEMALLOC_CONF`
的 `lg_prof_sample`,默认为 `19` (2^19 B = 512K),减小 `lg_prof_sample` 可以更频繁的采样使 Heap
Profile 接近真实内存,但这会带来更大的性能损耗。
+##### 2. 定期 Heap Dump(用于长时间观测内存)
-如果你在做性能测试,保持 `prof:false` 来避免 Heap Dump 的性能损耗。
+将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`。Heap Profile
文件默认保存在 `${DORIS_HOME}/log` 目录,文件名前缀由 `be.conf` 中的 `JEMALLOC_PROF_PRFIX` 指定,默认为
`jemalloc_heap_profile_`。
-##### 2. 定期 Heap Dump,用于长时间观测内存
+> **注意:** 在 Doris 2.1.6 之前,`JEMALLOC_PROF_PRFIX` 为空,需要修改为任意值作为 profile 文件名。
-将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof:false` 修改为 `prof:true`,Heap Profile
文件所在目录默认为 `${DORIS_HOME}/log`, 文件名前缀是 `be.conf` 中的 `JEMALLOC_PROF_PRFIX`,默认是
`jemalloc_heap_profile_`。
+**Dump 触发方式:**
-> 在 Doris 2.1.6 之前,`JEMALLOC_PROF_PRFIX` 为空,需要修改为任意值作为 profile 文件名
+1. **内存累计申请一定值时 Dump**
-1. 内存累计申请一定值时dump:
+ 将 `be.conf` 中 `JEMALLOC_CONF` 的 `lg_prof_interval` 修改为 `34`,此时内存累计申请
16GB(2^34 B = 16GB)时会 Dump 一次 profile。可以修改为任意值来调整 Dump 间隔。
- 将 `be.conf` 中 `JEMALLOC_CONF` 的 `lg_prof_interval` 修改为 34,此时内存累计申请 16GB
(2^35 B = 16GB) 时 dump 一次 profile,可以修改为任意值来调整dump间隔。
+ > **注意:** 在 Doris 2.1.6 之前,`lg_prof_interval` 默认就是 `32`。
-> 在 Doris 2.1.6 之前,`lg_prof_interval` 默认就是32。
+2. **内存每次达到新高时 Dump**
-2. 内存每次达到新高时dump:
+ 将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof_gdump` 修改为 `true` 并重启 BE。
- 将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof_gdump` 修改为 `true` 并重启BE。
+3. **程序退出时 Dump 并检测内存泄漏**
-3. 程序退出时dump, 并检测内存泄漏:
+ 将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof_leak` 和 `prof_final` 修改为 `true` 并重启
BE。
- 将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof_leak` 和 `prof_final` 修改为 `true` 并重启BE。
+4. **Dump 内存累计值(growth)而非实时值**
-4. dump内存累计值(growth),而不是实时值:
+ 将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof_accum` 修改为 `true` 并重启 BE。使用 `jeprof
--alloc_space` 展示 heap dump 累计值。
- 将 `be.conf` 中 `JEMALLOC_CONF` 的 `prof_accum` 修改为 `true` 并重启BE。
- 使用 `jeprof --alloc_space` 展示 heap dump 累计值。
+##### 3. 使用 `jeprof` 解析 Heap Profile
-##### 3. `jeprof` 解析 Heap Profile
+使用 `be/bin/jeprof` 解析上面 Dump 的 Heap Profile。如果进程内存较大,解析过程可能需要几分钟,请耐心等待。
-使用 `be/bin/jeprof` 解析上面 Dump 的 Heap Profile,如果进程内存太大,解析过程可能需要几分钟,请耐心等待。
+若 Doris BE 部署路径的 `be/bin` 目录下没有 `jeprof` 二进制文件,可以将 `doris/tools` 目录下的 `jeprof`
打包后上传到服务器。
-若 Doris BE 部署路径的 `be/bin` 目录下没有 `jeprof` 这个二进制,可以将 `doris/tools` 目录下的 `jeprof`
打包后上传到服务器。
+> **注意事项:**
+> - 需要 addr2line 版本为 2.35.2 及以上,详情见下面的 QA-1。
+> - 尽可能在运行 Doris BE 的机器上直接执行 Heap Dump 和 `jeprof` 解析,详情见下面的 QA-2。
-> 需要 addr2line 版本为 2.35.2 及以上, 详情见下面的 QA-1
-> 尽可能让执行 Heap Dump 和执行 `jeprof` 解析 Heap Profile 在同一台服务器上,即尽可能在运行 Doris BE
的机器上直接解析 Heap Profile,详情见下面的 QA-2
+**1. 分析单个 Heap Profile 文件**
-1. 分析单个 Heap Profile 文件
+```bash
+jeprof --dot ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file
+```
-```shell
- jeprof --dot ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file
- ```
+执行完上述命令后,将终端输出的文本贴到[在线 dot 绘图网站](http://www.webgraphviz.com/),生成内存分配图进行分析。
- 执行完上述命令后将终端输出的文本贴到[在线dot绘图网站](http://www.webgraphviz.com/),生成内存分配图,然后进行分析。
+如果服务器方便传输文件,也可以直接生成调用关系图 PDF 文件。需要先安装绘图所需的依赖项:
- 如果服务器方便传输文件,也可以通过如下命令直接生成调用关系图 result.pdf 文件传输到本地后进行查看,需要安装绘图所需的依赖项。
+```bash
+yum install ghostscript graphviz
+jeprof --pdf ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file >
result.pdf
+```
-```shell
- yum install ghostscript graphviz
- jeprof --pdf ${DORIS_HOME}/lib/doris_be ${DORIS_HOME}/log/profile_file >
result.pdf
- ```
+[graphviz](http://www.graphviz.org/):在没有这个库时 pprof 只能转换为 text
格式,但这种方式不易查看。安装后,pprof 可以转换为 SVG、PDF 等格式,调用关系更加清晰。
- [graphviz](http://www.graphviz.org/):
在没有这个库的时候pprof只可以转化为text格式,但这种方式不易查看。那么安装这个库后,pprof可以转化为svg、pdf等格式,对于调用关系则更加清晰明了。
+**2. 分析两个 Heap Profile 文件的 diff**
-2. 分析两个 Heap Profile 文件的diff
+```bash
+jeprof --dot ${DORIS_HOME}/lib/doris_be --base=${DORIS_HOME}/log/profile_file
${DORIS_HOME}/log/profile_file2
+```
-```shell
- jeprof --dot ${DORIS_HOME}/lib/doris_be
--base=${DORIS_HOME}/log/profile_file ${DORIS_HOME}/log/profile_file2
- ```
+通过在一段时间内多次执行 Heap Dump,可以生成多个 heap 文件。选取较早时间的 heap 文件作为 baseline,与较晚时间的 heap
文件进行对比分析 diff。生成调用关系图的方法同上。
- 通过在一段时间内多次运行上述命令可以生成多个 heap 文件,可以选取较早时间的 heap 文件作为 baseline,与较晚时间的 heap
文件对比分析它们的diff,生成调用关系图的方法同上。
+##### 4. 常见问题(QA)
-##### 4. QA
+**QA-1:运行 jeprof 后出现大量错误:`addr2line: Dwarf Error: found dwarf version xxx,
this reader only handles version xxx`**
-1. 运行 jeprof 后出现很多错误: `addr2line: Dwarf Error: found dwarf version xxx, this
reader only handles version xxx`.
+GCC 11 之后默认使用 DWARF-v5,这要求 Binutils 2.35.2 及以上。Doris Ldb_toolchain 使用了 GCC
11。参考:https://gcc.gnu.org/gcc-11/changes.html。
-GCC 11 之后默认使用 DWARF-v5 ,这要求Binutils 2.35.2 及以上,Doris Ldb_toolchain 用了 GCC
11。see: https://gcc.gnu.org/gcc-11/changes.html。
+解决方法:升级 addr2line 到 2.35.2 版本。
-替换 addr2line 到 2.35.2,参考:
-```
-// 下载 addr2line 源码
+```bash
+# 下载 addr2line 源码
wget https://ftp.gnu.org/gnu/binutils/binutils-2.35.tar.bz2
-// 安装依赖项,如果需要
+# 安装依赖项(如果需要)
yum install make gcc gcc-c++ binutils
-// 编译&安装 addr2line
+# 编译 & 安装 addr2line
tar -xvf binutils-2.35.tar.bz2
cd binutils-2.35
./configure --prefix=/usr/local
make
make install
-// 验证
+# 验证
addr2line -h
-// 替换 addr2line
+# 替换 addr2line
chmod +x addr2line
mv /usr/bin/addr2line /usr/bin/addr2line.bak
mv /bin/addr2line /bin/addr2line.bak
@@ -209,23 +220,26 @@ cp addr2line /bin/addr2line
cp addr2line /usr/bin/addr2line
hash -r
```
-注意,不能使用 addr2line 2.3.9, 这可能不兼容,导致内存一直增长。
-2. 运行 `jeprof` 后出现很多错误: `addr2line: DWARF error: invalid or unhandled FORM
value: 0x25`,解析后的 Heap 栈都是代码的内存地址,而不是函数名称
+**注意:** 不能使用 addr2line 2.3.9,该版本可能不兼容,导致内存一直增长。
-通常是因为执行 Heap Dump 和执行 `jeprof` 解析 Heap Profile 不在同一台服务器上,导致 `jeprof`
使用符号表解析函数名称失败,尽可能在同一台机器上完成 Dump Heap 和 `jeprof` 解析的操作,,即尽可能在运行 Doris BE
的机器上直接解析 Heap Profile。
+**QA-2:运行 `jeprof` 后出现大量错误:`addr2line: DWARF error: invalid or unhandled FORM
value: 0x25`,解析后的 Heap 栈都是代码的内存地址而非函数名称**
-或者确认下运行 Doris BE 的机器 Linux 内核版本,将 `be/bin/doris_be` 二进制文件和 Heap Profile
文件下载到相同内核版本的机器上执行 `jeprof`。
+通常是因为执行 Heap Dump 和执行 `jeprof` 解析 Heap Profile 不在同一台服务器上,导致 `jeprof`
使用符号表解析函数名称失败。
-3. 如果在运行 Doris BE 的机器上直接解析 Heap Profile 后的 Heap 栈依然是代码的内存地址,而不是函数名称
+解决方法:
+- 尽可能在同一台机器上完成 Dump Heap 和 `jeprof` 解析的操作,即尽可能在运行 Doris BE 的机器上直接解析 Heap
Profile。
+- 或者确认运行 Doris BE 的机器 Linux 内核版本,将 `be/bin/doris_be` 二进制文件和 Heap Profile
文件下载到相同内核版本的机器上执行 `jeprof`。
-使用下面的脚本,手动解析 Heap Profile,修改这几个变量:
+**QA-3:如果在运行 Doris BE 的机器上直接解析 Heap Profile 后,Heap 栈依然是代码的内存地址而非函数名称**
-- heap: Heap Profile 的文件名。
-- bin: `be/bin/doris_be` 二进制文件名
-- llvm_symbolizer: llvm 符号表解析程序的路径,版本最好是编译 `be/bin/doris_be` 二进制使用的版本。
+使用下面的脚本手动解析 Heap Profile,修改这几个变量:
-```
+- `heap`:Heap Profile 的文件名。
+- `bin`:`be/bin/doris_be` 二进制文件名。
+- `llvm_symbolizer`:llvm 符号表解析程序的路径,版本最好是编译 `be/bin/doris_be` 二进制使用的版本。
+
+```bash
#!/bin/bash
## @brief
## @author zhoufei
@@ -276,27 +290,30 @@ fi
# vim: et tw=80 ts=2 sw=2 cc=80:
```
-4. 如果上面所有的方法都不行
-
-- 尝试在运行 Doris BE 的机器上重新编译 `be/bin/doris_be` 二进制,也就是让编译、运行、`jeprof` 解析在同一台机器上。
+**QA-4:如果上面所有的方法都不行**
-- 上面的操作后,如果 Heap 栈依然是代码的内存地址,尝试 `USE_JEMALLOC=OFF ./build.sh --be` 编译使用
TCMalloc 的 Doris BE,然后参考上面的章节使用 TCMalloc Heap Profile 分析内存。
+- 尝试在运行 Doris BE 的机器上重新编译 `be/bin/doris_be` 二进制,让编译、运行、`jeprof` 解析在同一台机器上。
+- 如果上述操作后 Heap 栈依然是代码的内存地址,尝试使用 `USE_JEMALLOC=OFF ./build.sh --be` 编译使用
TCMalloc 的 Doris BE,然后参考下面的章节使用 TCMalloc Heap Profile 分析内存。
-#### TCMalloc HEAP PROFILE
+#### TCMalloc Heap Profile
-> Doris 1.2.1 及之前版本使用 TCMalloc,Doris 1.2.2 版本开始默认使用 Jemalloc,如需切换 TCMalloc
可以这样编译 `USE_JEMALLOC=OFF sh build.sh --be`。
+> **说明:** Doris 1.2.1 及之前版本使用 TCMalloc,Doris 1.2.2 版本开始默认使用 Jemalloc。如需切换回
TCMalloc,可使用 `USE_JEMALLOC=OFF sh build.sh --be` 进行编译。
-当使用 TCMalloc 时,遇到大内存申请会将申请的堆栈打印到be.out文件中,一般的表现形式如下:
+当使用 TCMalloc 时,遇到大内存申请会将申请的堆栈打印到 `be.out` 文件中,一般的表现形式如下:
-```
+```text
tcmalloc: large alloc 1396277248 bytes == 0x3f3488000 @ 0x2af6f63 0x2c4095b
0x134d278 0x134bdcb 0x133d105 0x133d1d0 0x19930ed
```
-这个表示在Doris BE在这个堆栈上尝试申请`1396277248
bytes`的内存。我们可以通过`addr2line`命令去把堆栈还原成我们能够看懂的信,具体的例子如下所示。
+这表示 Doris BE 在该堆栈上尝试申请 `1396277248 bytes` 的内存。可以通过 `addr2line`
命令将堆栈还原成可读的信息,具体示例如下:
+```bash
+addr2line -e lib/doris_be 0x2af6f63 0x2c4095b 0x134d278 0x134bdcb 0x133d105
0x133d1d0 0x19930ed
```
-$ addr2line -e lib/doris_be 0x2af6f63 0x2c4095b 0x134d278 0x134bdcb 0x133d105
0x133d1d0 0x19930ed
+输出示例:
+
+```text
/home/ssd0/zc/palo/doris/core/thirdparty/src/gperftools-gperftools-2.7/src/tcmalloc.cc:1335
/home/ssd0/zc/palo/doris/core/thirdparty/src/gperftools-gperftools-2.7/src/tcmalloc.cc:1357
/home/disk0/baidu-doris/baidu/bdg/doris-baidu/core/be/src/exec/hash_table.cpp:267
@@ -306,22 +323,26 @@ $ addr2line -e lib/doris_be 0x2af6f63 0x2c4095b
0x134d278 0x134bdcb 0x133d105 0
thread.cpp:?
```
-有时内存的申请并不是大内存的申请导致,而是通过小内存不断的堆积导致的。那么就没有办法通过查看日志定位到具体的申请信息,那么就需要通过其他方式来获得信息。
+有时内存申请并非由大内存申请导致,而是通过小内存不断堆积导致。这种情况下无法通过查看日志定位具体的申请信息,就需要通过其他方式来获取信息。
-这个时候我们可以利用TCMalloc的[HEAP
PROFILE](https://gperftools.github.io/gperftools/heapprofile.html)的功能。如果设置了HEAPPROFILE功能,那么我们可以获得进程整体的内存申请使用情况。使用方式是在启动Doris
BE前设置`HEAPPROFILE`环境变量。比如:
+这时可以利用 TCMalloc 的 [HEAP
PROFILE](https://gperftools.github.io/gperftools/heapprofile.html) 功能。设置
HEAPPROFILE 功能后,可以获得进程整体的内存申请使用情况。使用方式是在启动 Doris BE 前设置 `HEAPPROFILE` 环境变量。例如:
-```
-export TCMALLOC_SAMPLE_PARAMETER=64000 HEAP_PROFILE_ALLOCATION_INTERVAL=-1
HEAP_PROFILE_INUSE_INTERVAL=-1 HEAP_PROFILE_TIME_INTERVAL=5
HEAPPROFILE=/tmp/doris_be.hprof
+```bash
+export TCMALLOC_SAMPLE_PARAMETER=64000 HEAP_PROFILE_ALLOCATION_INTERVAL=-1
HEAP_PROFILE_INUSE_INTERVAL=-1 HEAP_PROFILE_TIME_INTERVAL=5
HEAPPROFILE=/tmp/doris_be.hprof
./bin/start_be.sh --daemon
```
-> 需要注意,HEAPPROFILE 需要是绝对路径,且已经存在。
+> **注意:** HEAPPROFILE 需要是绝对路径,且目录必须已经存在。
-这样,当满足HEAPPROFILE的dump条件时,就会将内存的整体使用情况写到指定路径的文件中。后续我们就可以通过使用`pprof`工具来对输出的内容进行分析。
+这样,当满足 HEAPPROFILE 的 Dump 条件时,就会将内存的整体使用情况写入到指定路径的文件中。后续可以使用 `pprof`
工具对输出的内容进行分析。
+```bash
+pprof --text lib/doris_be /tmp/doris_be.hprof.0012.heap | head -30
```
-$ pprof --text lib/doris_be /tmp/doris_be.hprof.0012.heap | head -30
+输出示例:
+
+```text
Using local file lib/doris_be.
Using local file /tmp/doris_be.hprof.0012.heap.
Total: 668.6 MB
@@ -338,30 +359,35 @@ Total: 668.6 MB
1.7 0.3% 98.4% 1.7 0.3% doris::SegmentReader::_load_index
```
-上述文件各个列的内容:
+**各列的含义:**
-* 第一列:函数直接申请的内存大小,单位MB
-* 第四列:函数以及函数所有调用的函数总共内存大小。
-* 第二列、第五列分别是第一列与第四列的比例值。
-* 第三列是个第二列的累积值。
+- **第一列**:函数直接申请的内存大小,单位 MB。
+- **第二列**:第一列的百分比。
+- **第三列**:第二列的累积值。
+- **第四列**:函数及其所有调用的函数总共占用的内存大小,单位 MB。
+- **第五列**:第四列的百分比。
-当然也可以生成调用关系图片,更加方便分析。比如下面的命令就能够生成SVG格式的调用关系图。
+当然也可以生成调用关系图片,更加方便分析。例如下面的命令可以生成 SVG 格式的调用关系图:
-```
-pprof --svg lib/doris_be /tmp/doris_be.hprof.0012.heap > heap.svg
+```bash
+pprof --svg lib/doris_be /tmp/doris_be.hprof.0012.heap > heap.svg
```
-**注意:开启这个选项是要影响程序的执行性能的,请慎重对线上的实例开启**
+**性能提示:** 开启该选项会影响程序的执行性能,请慎重对线上实例开启。
-##### pprof remote server
+##### pprof Remote Server
-HEAP PROFILE虽然能够获得全部的内存使用信息,但是也有比较受限的地方。1. 需要重启BE进行。2.
需要一直开启这个命令,导致对整个进程的性能造成影响。
+HEAP PROFILE 虽然能够获得全部的内存使用信息,但也有一些限制:1. 需要重启 BE;2. 需要一直开启该功能,导致对进程性能造成持续影响。
-对Doris BE也可以使用动态开启、关闭heap
profile的方式来对进程进行内存申请分析。Doris内部支持了GPerftools的[远程server调试](https://gperftools.github.io/gperftools/pprof_remote_servers.html)。那么可以通过`pprof`直接对远程运行的Doris
BE进行动态的HEAP PROFILE。比如我们可以通过以下命令来查看Doris的内存的使用增量
+对 Doris BE 可以使用动态开启、关闭 heap profile 的方式来分析进程的内存申请情况。Doris 内部支持了 GPerftools
的[远程 server
调试](https://gperftools.github.io/gperftools/pprof_remote_servers.html)。可以通过
`pprof` 工具直接对远程运行的 Doris BE 进行动态的 HEAP PROFILE。例如,通过以下命令查看 Doris 的内存使用增量:
+```bash
+pprof --text --seconds=60 http://be_host:be_webport/pprof/heap
```
-$ pprof --text --seconds=60 http://be_host:be_webport/pprof/heap
+输出示例:
+
+```text
Total: 1296.4 MB
484.9 37.4% 37.4% 484.9 37.4% doris::StorageByteBuffer::create
272.2 21.0% 58.4% 273.3 21.1% doris::RowBlock::init
@@ -378,27 +404,27 @@ Total: 1296.4 MB
10.0 0.8% 93.4% 10.0 0.8%
doris::PlainTextLineReader::PlainTextLineReader
```
-这个命令的输出与HEAP PROFILE的输出及查看方式一样,这里就不再详细说明。这个命令只有在执行的过程中才会开启统计,相比HEAP
PROFILE对于进程性能的影响有限。
+这个命令的输出和查看方式与 HEAP PROFILE 的输出一致。该命令只在执行过程中开启统计,相比 HEAP PROFILE 对进程性能的影响更小。
-#### LSAN
+#### LSAN(内存泄漏检测工具)
-[LSAN](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)是一个地址检查工具,GCC已经集成。在我们编译代码的时候开启相应的编译选项,就能够开启这个功能。当程序发生可以确定的内存泄露时,会将泄露堆栈打印。Doris
BE已经集成了这个工具,只需要在编译的时候使用如下的命令进行编译就能够生成带有内存泄露检测版本的BE二进制
+[LSAN](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)
是一个地址检查工具,GCC 已经集成。在编译代码时开启相应的编译选项,就能够开启该功能。当程序发生可以确定的内存泄漏时,会将泄漏堆栈打印出来。Doris
BE 已经集成了该工具,只需在编译时使用如下命令即可生成带有内存泄漏检测版本的 BE 二进制:
-```
+```bash
BUILD_TYPE=LSAN ./build.sh
```
-当系统检测到内存泄露的时候,就会在be.out里面输出对应的信息。为了下面的演示,我们故意在代码中插入一段内存泄露代码。我们在`StorageEngine`的`open`函数中插入如下代码
+当系统检测到内存泄漏时,就会在 `be.out` 中输出对应的信息。为了演示,我们故意在代码中插入一段内存泄漏代码。在 `StorageEngine` 的
`open` 函数中插入如下代码:
-```
- char* leak_buf = new char[1024];
- strcpy(leak_buf, "hello world");
- LOG(INFO) << leak_buf;
+```cpp
+char* leak_buf = new char[1024];
+strcpy(leak_buf, "hello world");
+LOG(INFO) << leak_buf;
```
-我们就在be.out中获得了如下的输出
+然后在 `be.out` 中就能获得如下输出:
-```
+```text
=================================================================
==24732==ERROR: LeakSanitizer: detected memory leaks
@@ -411,33 +437,33 @@ Direct leak of 1024 byte(s) in 1 object(s) allocated from:
SUMMARY: LeakSanitizer: 1024 byte(s) leaked in 1 allocation(s).
```
-从上述的输出中,我们能看到有1024个字节被泄露了,并且打印出来了内存申请时的堆栈信息。
+从上述输出中可以看到有 1024 个字节被泄漏,并且打印出了内存申请时的堆栈信息。
-**注意:开启这个选项是要影响程序的执行性能的,请慎重对线上的实例开启**
+**性能提示:** 开启该选项会影响程序的执行性能,请慎重对线上实例开启。
-**注意:如果开启了LSAN开关的话,tcmalloc就会被自动关闭**
+**注意:** 开启 LSAN 后,TCMalloc 会被自动关闭。
-#### ASAN
+#### ASAN(地址合法性检测工具)
-除了内存使用不合理、泄露以外。有的时候也会发生内存访问非法地址等错误。这个时候我们可以借助[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer)来辅助我们找到问题的原因。与LSAN一样,ASAN也集成在了GCC中。Doris通过如下的方式进行编译就能够开启这个功能
+除了内存使用不合理、泄漏以外,有时也会发生内存访问非法地址等错误。这时可以借助
[ASAN](https://github.com/google/sanitizers/wiki/AddressSanitizer) 来帮助找到问题的原因。与
LSAN 一样,ASAN 也集成在了 GCC 中。Doris 通过如下方式进行编译就能开启该功能:
-```
+```bash
BUILD_TYPE=ASAN ./build.sh
```
-执行编译生成的二进制文件,当检测工具发现有异常访问时,就会立即退出,并将非法访问的堆栈输出在be.out中。对于ASAN的输出与LSAN是一样的分析方法。这里我们也主动注入一个地址访问错误,来展示下具体的内容输出。我们仍然在`StorageEngine`的`open`函数中注入一段非法内存访问,具体的错误代码如下
+执行编译生成的二进制文件后,当检测工具发现异常访问时,就会立即退出,并将非法访问的堆栈输出在 `be.out` 中。对于 ASAN 的输出与 LSAN
使用相同的分析方法。为了演示,我们主动注入一个地址访问错误。仍然在 `StorageEngine` 的 `open` 函数中注入一段非法内存访问代码:
-```
- char* invalid_buf = new char[1024];
- for (int i = 0; i < 1025; ++i) {
- invalid_buf[i] = i;
- }
- LOG(INFO) << invalid_buf;
+```cpp
+char* invalid_buf = new char[1024];
+for (int i = 0; i < 1025; ++i) {
+ invalid_buf[i] = i;
+}
+LOG(INFO) << invalid_buf;
```
-然后我们就会在be.out中获得如下的输出
+然后在 `be.out` 中就会获得如下输出:
-```
+```text
=================================================================
==23284==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x61900008bf80 at pc 0x00000129f56a bp 0x7fff546eed90 sp 0x7fff546eed88
WRITE of size 1 at 0x61900008bf80 thread T0
@@ -456,66 +482,69 @@ allocated by thread T0 here:
SUMMARY: AddressSanitizer: heap-buffer-overflow
/home/ssd0/zc/palo/doris/core/be/src/olap/storage_engine.cpp:106 in
doris::StorageEngine::open(doris::EngineOptions const&, doris::StorageEngine**)
```
-从这段信息中该可以看到在`0x61900008bf80`这个地址我们尝试去写一个字节,但是这个地址是非法的。我们也可以看到
`[0x61900008bb80,0x61900008bf80)`这个地址的申请堆栈。
+从这段信息中可以看到在 `0x61900008bf80` 这个地址尝试写入一个字节,但该地址是非法的。同时也可以看到
`[0x61900008bb80,0x61900008bf80)` 这个地址区域的申请堆栈。
-**注意:开启这个选项是要影响程序的执行性能的,请慎重对线上的实例开启**
+**性能提示:** 开启该选项会影响程序的执行性能,请慎重对线上实例开启。
-**注意:如果开启了ASAN开关的话,tcmalloc就会被自动关闭**
+**注意:** 开启 ASAN 后,TCMalloc 会被自动关闭。
-另外,如果be.out中输出了堆栈信息,但是并没有函数符号,那么这个时候需要我们手动的处理下才能获得可读的堆栈信息。具体的处理方法需要借助一个脚本来解析ASAN的输出。这个时候我们需要使用[asan_symbolize](https://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/asan/scripts/asan_symbolize.py)来帮忙解析下。具体的使用方式如下:
+另外,如果 `be.out` 中输出的堆栈信息没有函数符号,需要手动处理才能获得可读的堆栈信息。可以使用
[asan_symbolize](https://llvm.org/svn/llvm-project/compiler-rt/trunk/lib/asan/scripts/asan_symbolize.py)
脚本来解析 ASAN 的输出,具体使用方式如下:
-```
+```bash
cat be.out | python asan_symbolize.py | c++filt
```
-通过上述的命令,我们就能够获得可读的堆栈信息了。
+通过上述命令就能获得可读的堆栈信息。
-### CPU
+### CPU 调试
-当系统的CPU
Idle很低的时候,说明系统的CPU已经成为了主要瓶颈,这个时候就需要分析一下当前的CPU使用情况。对于Doris的BE可以有如下两种方式来分析Doris的CPU瓶颈。
+当系统的 CPU Idle 很低时,说明 CPU 已经成为主要瓶颈,这时需要分析当前的 CPU 使用情况。对于 Doris BE,有以下两种方式来分析
CPU 瓶颈。
#### pprof
-[pprof](https://github.com/google/pprof):
来自gperftools,用于将gperftools所产生的内容转化成便于人可以阅读的格式,比如pdf, svg, text等.
+[pprof](https://github.com/google/pprof) 来自 gperftools,用于将 gperftools
产生的内容转换成便于阅读的格式,如 PDF、SVG、Text 等。
-由于Doris内部已经集成了并兼容了GPerf的REST接口,那么用户可以通过`pprof`工具来分析远程的Doris BE。具体的使用方式如下:
+由于 Doris 内部已集成并兼容了 GPerf 的 REST 接口,可以通过 `pprof` 工具分析远程的 Doris BE。具体使用方式如下:
-```
-pprof --svg --seconds=60 http://be_host:be_webport/pprof/profile > be.svg
+```bash
+pprof --svg --seconds=60 http://be_host:be_webport/pprof/profile > be.svg
```
-这样就能够生成一张BE执行的CPU消耗图。
+该命令会生成一张 BE 执行的 CPU 消耗图。

-#### perf + flamegragh
+#### perf + FlameGraph
-这个是相当通用的一种CPU分析方式,相比于`pprof`,这种方式必须要求能够登陆到分析对象的物理机上。但是相比于pprof只能定时采点,perf是能够通过不同的事件来完成堆栈信息采集的。具体的使用方式如下:
+这是一种非常通用的 CPU 分析方式。相比 `pprof`,这种方式必须要求能够登录到分析对象的物理机上。但相比 pprof 只能定时采样,perf
能够通过不同的事件来完成堆栈信息采集。
-[perf](https://perf.wiki.kernel.org/index.php/Main_Page):
linux内核自带性能分析工具。[这里](http://www.brendangregg.com/perf.html)有一些perf的使用例子。
+**工具介绍:**
-[FlameGraph](https://github.com/brendangregg/FlameGraph):
可视化工具,用于将perf的输出以火焰图的形式展示出来。
+- [perf](https://perf.wiki.kernel.org/index.php/Main_Page):Linux
内核自带的性能分析工具。[这里](http://www.brendangregg.com/perf.html)有一些 perf 的使用示例。
+- [FlameGraph](https://github.com/brendangregg/FlameGraph):可视化工具,用于将 perf
的输出以火焰图的形式展示。
-```
+**使用方法:**
+
+```bash
perf record -g -p be_pid -- sleep 60
```
-这条命令会统计60秒钟BE的CPU运行情况,并且生成perf.data。对于perf.data的分析,可以通过perf的命令来进行分析
+该命令会统计 60 秒钟 BE 的 CPU 运行情况,并生成 `perf.data` 文件。对于 `perf.data` 的分析,可以通过 perf
命令进行:
-```
+```bash
perf report
```
-分析得到如下的图片
+分析得到的示例:

-来对生成的内容进行分析。当然也可以使用flamegragh完成可视化展示。
+当然也可以使用 FlameGraph 进行可视化展示:
-```
+```bash
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl
> be.svg
```
-这样也会生成一张当时运行的CPU消耗图。
+这样也会生成一张当时运行的 CPU 消耗图。

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]