This is an automated email from the ASF dual-hosted git repository.
zclllyybb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 0ca927e21fe [feature](be) Add BE stack trace HTTP API (#3945)
0ca927e21fe is described below
commit 0ca927e21fea34ab3a9662b5c7ac50c139fb587e
Author: zclllyybb <[email protected]>
AuthorDate: Mon Jun 22 14:58:20 2026 +0800
[feature](be) Add BE stack trace HTTP API (#3945)
docs of https://github.com/apache/doris/pull/64454
---
community/developer-guide/be-stack-trace.md | 143 +++++++++++++++++++++
.../current/developer-guide/be-stack-trace.md | 143 +++++++++++++++++++++
sidebarsCommunity.json | 1 +
3 files changed, 287 insertions(+)
diff --git a/community/developer-guide/be-stack-trace.md
b/community/developer-guide/be-stack-trace.md
new file mode 100644
index 00000000000..bba34b57c3c
--- /dev/null
+++ b/community/developer-guide/be-stack-trace.md
@@ -0,0 +1,143 @@
+---
+title: BE Thread Stack Trace
+language: en
+description: Collect live thread stack traces from an Apache Doris BE process
through the BE HTTP API.
+keywords:
+ - Apache Doris
+ - BE stack trace
+ - thread stack
+ - troubleshooting
+ - debug
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# BE Thread Stack Trace
+
+<!-- Knowledge type: Tool usage -->
+<!-- Applicable scenarios: Troubleshooting -->
+
+## Purpose
+
+The BE thread stack trace API collects live thread stacks from a running
Apache Doris BE process through HTTP. Use it when a BE is slow, blocked, or
consuming unexpected CPU. The response is plain text and only reflects the
target BE.
+
+This API is supported on Linux BE processes.
+
+## Quick Start
+
+1. Find the target BE HTTP endpoint:
+
+ ```sql
+ SHOW BACKENDS;
+ ```
+
+ Use the `Host` and `HttpPort` fields of the target BE.
+
+2. Collect stacks from all BE threads:
+
+ ```bash
+ curl -s "http://{be_host}:{be_webport}/api/stack_trace" -o
be_stack_trace.txt
+ ```
+
+3. Collect stacks from specific thread IDs:
+
+ Thread IDs can be obtained from a previous full-stack response or from OS
tools such as `top -H -p <be_pid>`.
+
+ ```bash
+ curl -s
"http://{be_host}:{be_webport}/api/stack_trace?thread_id=12345,12346&mode=FAST&timeout_ms=3000"
-o be_stack_trace_tid.txt
+ ```
+
+4. Use conservative mode when you want to avoid signaling threads that are
blocked in interrupt-sensitive syscalls:
+
+ ```bash
+ curl -s
"http://{be_host}:{be_webport}/api/stack_trace?skip_blocking_syscalls=true" -o
be_stack_trace_safe.txt
+ ```
+
+## Parameters
+
+All parameters are optional query parameters of `GET /api/stack_trace`.
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `thread_id` | None | Comma-separated Linux thread IDs. If omitted, all BE
threads are sampled. |
+| `tid` | None | Legacy alias of `thread_id`. Do not use it together with
`thread_id`. |
+| `mode` | `FAST` | Alias of `dwarf_location_info_mode`. Values: `DISABLED`,
`FAST`, `FULL`, `FULL_WITH_INLINE`. |
+| `dwarf_location_info_mode` | `FAST` | Controls DWARF symbolization detail.
If both `mode` and this parameter are set, this parameter is used. |
+| `timeout_ms` | `100` | Timeout in milliseconds for each thread. Valid range:
`1` to `10000`. |
+| `skip_blocking_syscalls` | `false` | Whether to skip threads blocked in
syscalls such as `read`, `poll`, `select`, `epoll_wait`, `futex`, and
`nanosleep`. Values: `true`, `false`, `1`, `0`. |
+
+Recommended symbolization modes:
+
+| Mode | Usage |
+|------|-------|
+| `DISABLED` | Fastest mode. Use it when you only need thread names, statuses,
and raw frames. |
+| `FAST` | Default mode. Suitable for most troubleshooting. |
+| `FULL` | More complete file and line information. Use only when `FAST` is
not enough. |
+| `FULL_WITH_INLINE` | Includes inline frame information. This is the most
expensive mode. |
+
+## Output
+
+The response starts with collection metadata, followed by one section per
thread and a final summary line.
+
+Example:
+
+```text
+BE thread stack traces
+pid: 96543
+service_signal: 40
+thread_count: 1168
+timeout_ms_per_thread: 100
+dwarf_location_info_mode: fast
+skip_blocking_syscalls: false
+signal_handler_unwinder:
frame_pointer_with_coordinator_signal_context_libunwind_fallback
+
+----- thread ... (p_normal_simple) status=ok capture_method=frame_pointer
frames=29 fp_status=end_of_chain ... -----
+ 0# doris::md5_hex_batch(...)
+ 1#
doris::FunctionStringDigestMulti<doris::MD5Sum>::vector_execute_single_md5(...)
+ ...
+ 20# doris::AggSinkLocalState::_execute_without_key(doris::Block*)
+ 21# doris::AggSinkLocalState::Executor<true, false>::execute(...)
+ 22# doris::AggSinkOperatorX::sink_impl(...)
+ 23# doris::DataSinkOperatorXBase::sink(...)
+ 24# doris::PipelineTask::execute(bool*)
+ 25# doris::TaskScheduler::_do_work(int)
+
+summary: captured=1160 skipped=4 timed_out=0 remote_signal_attempts=1164
+```
+
+Common thread statuses:
+
+| Status | Meaning |
+|--------|---------|
+| `ok` | Stack was captured successfully. |
+| `ok_current_thread` | Stack was captured from the HTTP handler thread
itself. |
+| `skipped_blocking_syscall` | The thread was skipped because
`skip_blocking_syscalls=true` and the thread was in an interrupt-sensitive
syscall. |
+| `signal_blocked` | The target thread blocks the diagnostic signal. |
+| `thread_exited` | The specified thread exited before it was sampled. |
+| `signal_error` | Sending the diagnostic signal failed. |
+| `timeout` | Stack capture did not finish within `timeout_ms`. |
+
+## Notes
+
+- The default full-process request samples all BE threads and does not skip
blocking syscalls. This preserves blocked worker stacks, which are often useful
during incidents.
+- This is not a stop-the-world snapshot. Threads are sampled one by one, so
their stacks are close in time but not from exactly the same instant.
+- Only one stack trace request can run in a BE process at a time. Concurrent
requests return HTTP `409`.
+- Invalid parameters return HTTP `400`. Non-Linux BE processes return HTTP
`501`.
+- Prefer `thread_id` filtering and `mode=FAST` for routine diagnosis. Use
`FULL` or `FULL_WITH_INLINE` only when more detailed symbolization is needed.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/be-stack-trace.md
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/be-stack-trace.md
new file mode 100644
index 00000000000..353d053442b
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/developer-guide/be-stack-trace.md
@@ -0,0 +1,143 @@
+---
+title: BE 线程堆栈
+language: zh-CN
+description: 通过 Apache Doris BE HTTP API 采集运行中 BE 进程的实时线程堆栈。
+keywords:
+ - Apache Doris
+ - BE 线程堆栈
+ - BE 打栈
+ - 故障排查
+ - 调试
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# BE 线程堆栈
+
+<!-- 知识类型: 工具使用 -->
+<!-- 适用场景: 故障排查 -->
+
+## 用途
+
+BE 线程堆栈 API 用于通过 HTTP 采集运行中 Apache Doris BE 进程的实时线程堆栈。BE 出现响应慢、线程阻塞或 CPU
异常消耗时,可使用该接口辅助定位问题。接口返回纯文本结果,且只采集目标 BE。
+
+该 API 仅支持 Linux BE 进程。
+
+## 快速使用
+
+1. 查询目标 BE 的 HTTP 入口:
+
+ ```sql
+ SHOW BACKENDS;
+ ```
+
+ 使用目标 BE 的 `Host` 与 `HttpPort` 字段。
+
+2. 采集所有 BE 线程堆栈:
+
+ ```bash
+ curl -s "http://{be_host}:{be_webport}/api/stack_trace" -o
be_stack_trace.txt
+ ```
+
+3. 采集指定线程 ID 的堆栈:
+
+ 线程 ID 可从一次全量打栈结果中获取,也可通过 `top -H -p <be_pid>` 等系统工具查看。
+
+ ```bash
+ curl -s
"http://{be_host}:{be_webport}/api/stack_trace?thread_id=12345,12346&mode=FAST&timeout_ms=3000"
-o be_stack_trace_tid.txt
+ ```
+
+4. 如需避免向处于易被中断系统调用中的线程发送信号,可使用保守模式:
+
+ ```bash
+ curl -s
"http://{be_host}:{be_webport}/api/stack_trace?skip_blocking_syscalls=true" -o
be_stack_trace_safe.txt
+ ```
+
+## 参数说明
+
+所有参数都是 `GET /api/stack_trace` 的可选查询参数。
+
+| 参数 | 默认值 | 说明 |
+|------|--------|------|
+| `thread_id` | 无 | Linux 线程 ID,多个 ID 用英文逗号分隔。不指定时采集所有 BE 线程。 |
+| `tid` | 无 | `thread_id` 的兼容别名,不能与 `thread_id` 同时使用。 |
+| `mode` | `FAST` | `dwarf_location_info_mode`
的别名。可选值:`DISABLED`、`FAST`、`FULL`、`FULL_WITH_INLINE`。 |
+| `dwarf_location_info_mode` | `FAST` | 控制 DWARF 符号化详情。如果同时设置 `mode`
与该参数,以该参数为准。 |
+| `timeout_ms` | `100` | 单个线程的采集超时时间,单位为毫秒。有效范围:`1` 到 `10000`。 |
+| `skip_blocking_syscalls` | `false` | 是否跳过处于
`read`、`poll`、`select`、`epoll_wait`、`futex`、`nanosleep`
等系统调用中的线程。可选值:`true`、`false`、`1`、`0`。 |
+
+符号化模式建议:
+
+| 模式 | 使用场景 |
+|------|----------|
+| `DISABLED` | 最快模式。只需要线程名、状态与原始栈帧时使用。 |
+| `FAST` | 默认模式,适合大多数排查场景。 |
+| `FULL` | 输出更完整的文件与行号信息,仅在 `FAST` 信息不足时使用。 |
+| `FULL_WITH_INLINE` | 包含 inline frame 信息,开销最大。 |
+
+## 输出说明
+
+接口返回内容包含采集元信息、每个线程的堆栈段,以及最后的汇总行。
+
+示例:
+
+```text
+BE thread stack traces
+pid: 96543
+service_signal: 40
+thread_count: 1168
+timeout_ms_per_thread: 100
+dwarf_location_info_mode: fast
+skip_blocking_syscalls: false
+signal_handler_unwinder:
frame_pointer_with_coordinator_signal_context_libunwind_fallback
+
+----- thread ... (p_normal_simple) status=ok capture_method=frame_pointer
frames=29 fp_status=end_of_chain ... -----
+ 0# doris::md5_hex_batch(...)
+ 1#
doris::FunctionStringDigestMulti<doris::MD5Sum>::vector_execute_single_md5(...)
+ ...
+ 20# doris::AggSinkLocalState::_execute_without_key(doris::Block*)
+ 21# doris::AggSinkLocalState::Executor<true, false>::execute(...)
+ 22# doris::AggSinkOperatorX::sink_impl(...)
+ 23# doris::DataSinkOperatorXBase::sink(...)
+ 24# doris::PipelineTask::execute(bool*)
+ 25# doris::TaskScheduler::_do_work(int)
+
+summary: captured=1160 skipped=4 timed_out=0 remote_signal_attempts=1164
+```
+
+常见线程状态:
+
+| 状态 | 含义 |
+|------|------|
+| `ok` | 成功采集堆栈。 |
+| `ok_current_thread` | 采集的是 HTTP 处理线程自身的堆栈。 |
+| `skipped_blocking_syscall` | 设置了
`skip_blocking_syscalls=true`,且线程处于易被中断的系统调用中,因此被跳过。 |
+| `signal_blocked` | 目标线程屏蔽了诊断信号。 |
+| `thread_exited` | 指定线程在采集前已退出。 |
+| `signal_error` | 发送诊断信号失败。 |
+| `timeout` | 在 `timeout_ms` 内未完成堆栈采集。 |
+
+## 注意事项
+
+- 默认全量采集会采集所有 BE 线程,且不会跳过阻塞系统调用。这样可以保留事故排查中常见的阻塞工作线程堆栈。
+- 该接口不是 stop-the-world 快照。线程会被逐个采样,因此堆栈时间接近,但不保证来自同一瞬间。
+- 单个 BE 进程同一时间只允许一个打栈请求。并发请求会返回 HTTP `409`。
+- 参数非法时返回 HTTP `400`。非 Linux BE 进程会返回 HTTP `501`。
+- 日常排查优先使用 `thread_id` 缩小范围,并使用 `mode=FAST`。仅在需要更多符号化信息时使用 `FULL` 或
`FULL_WITH_INLINE`。
diff --git a/sidebarsCommunity.json b/sidebarsCommunity.json
index 6d55b2bc44c..82983e2c1f9 100644
--- a/sidebarsCommunity.json
+++ b/sidebarsCommunity.json
@@ -73,6 +73,7 @@
"label": "Debugging & Profiling",
"items": [
"developer-guide/debug-tool",
+ "developer-guide/be-stack-trace",
"developer-guide/arthas",
"developer-guide/fe-profiler",
"developer-guide/pipeline-tracing",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]