This is an automated email from the ASF dual-hosted git repository.
zclllyybb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 0141aa20221 [feature](be) Add BE stack trace HTTP API (#64454)
0141aa20221 is described below
commit 0141aa20221c19c7b77290e948e3da7e795466e3
Author: zclllyybb <[email protected]>
AuthorDate: Thu Jun 18 10:11:01 2026 +0800
[feature](be) Add BE stack trace HTTP API (#64454)
This PR adds a BE-side diagnostic HTTP API for collecting thread stack
traces from a live BE process:
```text
GET /api/stack_trace
GET /api/stack_trace?thread_id=<tid>[,<tid>...]
GET /api/stack_trace?mode=<DISABLED|FAST|FULL|FULL_WITH_INLINE>
GET
/api/stack_trace?dwarf_location_info_mode=<DISABLED|FAST|FULL|FULL_WITH_INLINE>
GET /api/stack_trace?timeout_ms=<ms>
GET /api/stack_trace?skip_blocking_syscalls=<true|false>
```
`thread_id` is the preferred explicit selector. `tid` is kept only as a
legacy alias.
The default full-process collection samples all BE threads without a
signal-attempt cap and does not skip blocking syscalls. This is
intentional: during real incidents, blocked worker stacks are often the
useful part of the dump. `skip_blocking_syscalls=true` remains available
as an explicit conservative mode for isolating EINTR-sensitive paths.
The output is plain text and includes:
- BE pid, diagnostic signal number, thread count, timeout, DWARF mode,
and syscall-skip mode.
- One section per thread, with TID, thread name, capture status, capture
method, frame count, frame-pointer status, and stack bounds.
- A summary line with captured, skipped, timed-out, signal-error, and
exited counts.
Example header:
```text
BE thread stack traces
pid: 96543
service_signal: 40
thread_count: 1168
timeout_ms_per_thread: 3000
dwarf_location_info_mode: fast
skip_blocking_syscalls: false
signal_handler_unwinder:
frame_pointer_with_coordinator_signal_context_libunwind_fallback
```
Example execution stack captured under load:
```text
----- thread 12438 (p_normal_simple) status=ok capture_method=frame_pointer
frames=32 fp_status=end_of_chain stack_bounds=... -----
doris::CastToStringFunction::execute_impl(...)
doris::VectorizedFnCall::execute_column_impl(...)
doris::AggSinkLocalState::_execute_without_key(...)
doris::AggSinkOperatorX::sink_impl(...)
doris::DataSinkOperatorXBase::sink(...)
doris::PipelineTask::execute(bool*)
doris::TaskScheduler::_do_work(int)
```
Implementation notes:
- The remote-thread capture path uses a signal handler to copy only
lightweight frame-pointer/context data from the target thread.
Symbolization and fallback unwinding run outside the handler.
- The handler publishes through one process-wide slot, so HTTP
collection is serialized. The coordinator waits until the previous
handler has fully released its latch before signaling the next TID;
otherwise back-to-back captures can drop the next signal.
- Stack bounds are collected before sending any signal, because the
handler must not open `/proc`, allocate memory, or take locks while
interrupting an arbitrary BE thread.
- Frame-pointer walking validates stack-range bounds and pointer
alignment before reading frame records.
- The same PCs can be rendered with different DWARF modes, so the
StackTrace cache key includes `dwarf_location_info_mode`. This prevents
a `DISABLED` rendering from poisoning a later `FULL` request and
prevents full file/line detail from leaking into a disabled request.
- Instruction addresses are compared via `uintptr_t` instead of pointer
arithmetic on non-object instruction addresses.
- Threads that block the diagnostic signal are reported as
`status=signal_blocked` instead of being treated as ambiguous timeouts.
- This is not a global stop-the-world snapshot. Each thread is sampled
independently as the coordinator walks the TID list, so stacks are close
in time but not guaranteed to be from the exact same instant.
EINTR handling:
- Sending a diagnostic signal can interrupt blocking syscalls even when
`SA_RESTART` is used, especially for poll/select/accept-style waits.
- Doris Thrift servers previously could treat `Interrupted system call`
from `TServerSocket::acceptImpl()` as a fatal server-loop exception.
- This PR wraps BE Thrift server sockets with `ImprovedServerSocket`,
which retries only the narrow EINTR/`Interrupted system call` case for
accept/read/peek. Other transport exceptions keep the original behavior.
---
be/CMakeLists.txt | 16 +-
be/src/common/stack_trace.cpp | 30 +-
.../service/http/action/be_thread_stack_action.cpp | 1215 ++++++++++++++++++++
.../service/http/action/be_thread_stack_action.h | 43 +
be/src/service/http_service.cpp | 5 +
be/src/util/thrift_server.cpp | 85 +-
.../service/http/be_thread_stack_action_test.cpp | 437 +++++++
.../get/test_be_stack_trace_api.groovy | 169 +++
8 files changed, 1990 insertions(+), 10 deletions(-)
diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt
index 74c0e998226..0df7454eeac 100644
--- a/be/CMakeLists.txt
+++ b/be/CMakeLists.txt
@@ -257,7 +257,19 @@ SET(CONTRIB_PATH "${PROJECT_SOURCE_DIR}/../contrib")
# Out of source build need to set the binary dir
add_subdirectory(${CONTRIB_PATH}/apache-orc ${PROJECT_BINARY_DIR}/apache-orc
EXCLUDE_FROM_ALL)
-target_compile_options(orc PRIVATE -Wno-implicit-fallthrough -w)
+target_compile_options(orc PRIVATE -fno-omit-frame-pointer
-Wno-implicit-fallthrough -w)
+foreach(orc_tool
+ orc-contents
+ orc-metadata
+ orc-statistics
+ orc-scan
+ orc-memory
+ timezone-dump
+ csv-import)
+ if (TARGET ${orc_tool})
+ target_compile_options(${orc_tool} PRIVATE -fno-omit-frame-pointer)
+ endif()
+endforeach()
option(BUILD_STATIC_LIBRARIES "Build static libraries" ON)
option(BUILD_SHARED_LIBRARIES "Build shared libraries" OFF)
@@ -273,7 +285,7 @@ set(USE_BTHREAD OFF)
# Out of source build need to set the binary dir
add_subdirectory(${CONTRIB_PATH}/clucene ${PROJECT_BINARY_DIR}/clucene
EXCLUDE_FROM_ALL)
-set(clucene_options -w -Wall)
+set(clucene_options -fno-omit-frame-pointer -w -Wall)
if (COMPILER_CLANG)
set(clucene_options ${clucene_options} -Wno-c++11-narrowing)
else ()
diff --git a/be/src/common/stack_trace.cpp b/be/src/common/stack_trace.cpp
index 440c8d54f5b..4adcd108af6 100644
--- a/be/src/common/stack_trace.cpp
+++ b/be/src/common/stack_trace.cpp
@@ -24,9 +24,11 @@
#include <atomic>
#include <filesystem>
+#include <limits>
#include <map>
#include <mutex>
#include <sstream>
+#include <string_view>
#include <unordered_map>
#include "common/config.h"
@@ -288,7 +290,12 @@ StackTrace::StackTrace(const ucontext_t& signal_context) {
} else {
/// Skip excessive stack frames that we have created while finding
stack trace.
for (size_t i = 0; i < size; ++i) {
- if (frame_pointers[i] == caller_address) {
+ const auto frame_address =
reinterpret_cast<uintptr_t>(frame_pointers[i]);
+ const auto caller_address_value =
reinterpret_cast<uintptr_t>(caller_address);
+ if (caller_address_value != 0 &&
+ (frame_address == caller_address_value ||
+ (caller_address_value < std::numeric_limits<uintptr_t>::max()
&&
+ frame_address == caller_address_value + 1))) {
offset = i;
break;
}
@@ -329,20 +336,27 @@ struct StackTraceRefTriple {
const StackTrace::FramePointers& pointers;
size_t offset;
size_t size;
+ std::string_view dwarf_location_info_mode;
};
struct StackTraceTriple {
StackTrace::FramePointers pointers;
size_t offset;
size_t size;
+ std::string dwarf_location_info_mode;
};
template <class T>
concept MaybeRef = std::is_same_v<T, StackTraceTriple> || std::is_same_v<T,
StackTraceRefTriple>;
constexpr bool operator<(const MaybeRef auto& left, const MaybeRef auto&
right) {
- return std::tuple {left.pointers, left.size, left.offset} <
- std::tuple {right.pointers, right.size, right.offset};
+ // The same PCs can be rendered with different DWARF detail levels.
Keeping the mode in the
+ // key prevents a cheap disabled rendering from poisoning a later full
rendering, and prevents
+ // full file/line detail from leaking into a disabled request.
+ return std::tuple {left.pointers, left.size, left.offset,
+ std::string_view(left.dwarf_location_info_mode)} <
+ std::tuple {right.pointers, right.size, right.offset,
+ std::string_view(right.dwarf_location_info_mode)};
}
static void toStringEveryLineImpl([[maybe_unused]] const std::string
dwarf_location_info_mode,
@@ -427,7 +441,8 @@ static void toStringEveryLineImpl([[maybe_unused]] const
std::string dwarf_locat
}
void StackTrace::toStringEveryLine(std::function<void(std::string_view)>
callback) const {
- toStringEveryLineImpl("FULL_WITH_INLINE", {frame_pointers, offset, size},
std::move(callback));
+ toStringEveryLineImpl("FULL_WITH_INLINE", {frame_pointers, offset, size,
"FULL_WITH_INLINE"},
+ std::move(callback));
}
using StackTraceCache = std::map<StackTraceTriple, std::string, std::less<>>;
@@ -447,7 +462,7 @@ std::string toStringCached(const StackTrace::FramePointers&
pointers, size_t off
std::lock_guard lock {stacktrace_cache_mutex};
StackTraceCache& cache = cacheInstance();
- const StackTraceRefTriple key {pointers, offset, size};
+ const StackTraceRefTriple key {pointers, offset, size,
dwarf_location_info_mode};
if (auto it = cache.find(key); it != cache.end()) {
return it->second;
@@ -456,7 +471,10 @@ std::string toStringCached(const
StackTrace::FramePointers& pointers, size_t off
toStringEveryLineImpl(dwarf_location_info_mode, key,
[&](std::string_view str) { out << str << '\n';
});
- return cache.emplace(StackTraceTriple {pointers, offset, size},
out.str()).first->second;
+ return cache
+ .emplace(StackTraceTriple {pointers, offset, size,
dwarf_location_info_mode},
+ out.str())
+ .first->second;
}
}
diff --git a/be/src/service/http/action/be_thread_stack_action.cpp
b/be/src/service/http/action/be_thread_stack_action.cpp
new file mode 100644
index 00000000000..79241afd8b1
--- /dev/null
+++ b/be/src/service/http/action/be_thread_stack_action.cpp
@@ -0,0 +1,1215 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "service/http/action/be_thread_stack_action.h"
+
+#include <fmt/format.h>
+
+#ifdef __linux__
+#include <fcntl.h>
+#include <poll.h>
+#include <sys/syscall.h>
+#include <ucontext.h>
+#include <unistd.h>
+
+#include <algorithm>
+#include <array>
+#include <atomic>
+#include <cctype>
+#include <cerrno>
+#include <chrono>
+#include <csignal>
+#include <cstdint>
+#include <cstdlib>
+#include <cstring>
+#include <filesystem>
+#include <fstream>
+#include <limits>
+#include <mutex>
+#include <optional>
+#include <sstream>
+#include <string>
+#include <string_view>
+#include <thread>
+#include <vector>
+
+#if defined(USE_UNWIND) && USE_UNWIND && defined(__x86_64__)
+#ifndef UNW_LOCAL_ONLY
+#define UNW_LOCAL_ONLY
+#endif
+#include <libunwind.h>
+#endif
+#endif
+
+#include "common/logging.h"
+#include "common/stack_trace.h"
+#include "service/http/http_channel.h"
+#include "service/http/http_headers.h"
+#include "service/http/http_request.h"
+#include "service/http/http_status.h"
+
+namespace doris {
+
+namespace {
+
+constexpr std::string_view HEADER_TEXT = "text/plain; charset=utf-8";
+
+#ifdef __linux__
+
+constexpr int STACK_TRACE_SIGNAL_OFFSET = 6;
+constexpr int DEFAULT_TIMEOUT_MS = 100;
+constexpr int MAX_TIMEOUT_MS = 10000;
+constexpr size_t MAX_MEMORY_RANGES = 8192;
+constexpr std::string_view DEFAULT_DWARF_MODE = "FAST";
+
+struct ThreadInfo {
+ pid_t tid = 0;
+ std::string name;
+};
+
+struct MemoryRange {
+ uintptr_t begin = 0;
+ uintptr_t end = 0;
+};
+
+enum class FramePointerStatus {
+ END_OF_CHAIN,
+ NO_CONTEXT,
+ UNSUPPORTED_ARCH,
+ NO_STACK_RANGE,
+ INVALID_FRAME_POINTER,
+ FRAME_LIMIT,
+};
+
+enum class SignalContextUnwindStatus {
+ NOT_ATTEMPTED,
+ END_OF_STACK,
+ INIT_ERROR,
+ GET_IP_ERROR,
+ STEP_ERROR,
+ FRAME_LIMIT,
+ UNSUPPORTED,
+};
+
+struct FramePointerCapture {
+ StackTrace::FramePointers frame_pointers {};
+ size_t size = 0;
+ uintptr_t stack_begin = 0;
+ uintptr_t stack_end = 0;
+ FramePointerStatus fp_status = FramePointerStatus::NO_CONTEXT;
+ bool used_signal_context_unwind = false;
+ SignalContextUnwindStatus signal_context_unwind_status =
+ SignalContextUnwindStatus::NOT_ATTEMPTED;
+ int signal_context_unwind_error = 0;
+};
+
+struct ThreadSyscall {
+ long number = -1;
+ std::string name;
+};
+
+std::once_flag g_install_signal_once;
+std::mutex g_collect_mutex;
+std::atomic<pid_t> g_server_pid {0};
+std::atomic<int> g_sequence {0};
+// The signal handler cannot allocate per-request state safely, so it
publishes into one
+// process-wide slot. The latch protects that slot from nested or back-to-back
signals while the
+// coordinator is still copying or unwinding the previous thread's context.
+std::atomic<int> g_active_sequence {0};
+std::atomic<int> g_data_ready_sequence {0};
+std::atomic<int> g_unwind_wait_sequence {0};
+std::atomic<int> g_unwind_release_sequence {0};
+std::atomic<bool> g_signal_latch {false};
+FramePointerCapture g_signal_capture;
+ucontext_t g_signal_context {};
+std::array<MemoryRange, MAX_MEMORY_RANGES> g_memory_ranges {};
+size_t g_memory_range_count = 0;
+int g_notification_pipe[2] = {-1, -1};
+
+int rt_tgsigqueueinfo(pid_t tgid, pid_t tid, int sig, siginfo_t* info) {
+ return static_cast<int>(syscall(__NR_rt_tgsigqueueinfo, tgid, tid, sig,
info));
+}
+
+int stack_trace_signal() {
+ static const int signal = [] {
+ const int candidate = SIGRTMIN + STACK_TRACE_SIGNAL_OFFSET;
+ return candidate <= SIGRTMAX ? candidate : -1;
+ }();
+ return signal;
+}
+
+pid_t get_current_tid() {
+ return static_cast<pid_t>(syscall(SYS_gettid));
+}
+
+bool read_signal_registers(const ucontext_t* context, uintptr_t* pc,
uintptr_t* fp, uintptr_t* sp) {
+ if (context == nullptr) {
+ return false;
+ }
+
+#if defined(__x86_64__)
+ *pc = static_cast<uintptr_t>(context->uc_mcontext.gregs[REG_RIP]);
+ *fp = static_cast<uintptr_t>(context->uc_mcontext.gregs[REG_RBP]);
+ *sp = static_cast<uintptr_t>(context->uc_mcontext.gregs[REG_RSP]);
+ return true;
+#elif defined(__aarch64__)
+ *pc = static_cast<uintptr_t>(context->uc_mcontext.pc);
+ *fp = static_cast<uintptr_t>(context->uc_mcontext.regs[29]);
+ *sp = static_cast<uintptr_t>(context->uc_mcontext.sp);
+ return true;
+#else
+ return false;
+#endif
+}
+
+bool range_contains(const MemoryRange& range, uintptr_t address) {
+ return address >= range.begin && address < range.end;
+}
+
+bool frame_record_is_readable(const MemoryRange& range, uintptr_t fp) {
+ constexpr uintptr_t frame_record_size = sizeof(uintptr_t) * 2;
+ // The interrupted register can come from code without frame pointers or
from a prologue/
+ // epilogue. Bounds plus alignment keep the handler from reinterpreting
arbitrary stack bytes
+ // as a frame record.
+ return fp % alignof(uintptr_t) == 0 && fp >= range.begin &&
+ fp <= std::numeric_limits<uintptr_t>::max() - frame_record_size &&
+ fp + frame_record_size <= range.end;
+}
+
+const MemoryRange* find_stack_range(uintptr_t sp, uintptr_t fp) {
+ for (size_t i = 0; i < g_memory_range_count; ++i) {
+ const auto& range = g_memory_ranges[i];
+ if (range_contains(range, sp) && range_contains(range, fp)) {
+ return ⦥
+ }
+ }
+ return nullptr;
+}
+
+void append_frame(FramePointerCapture* capture, uintptr_t pc) {
+ if (pc == 0 || capture->size >= capture->frame_pointers.size()) {
+ return;
+ }
+ capture->frame_pointers[capture->size++] = reinterpret_cast<void*>(pc);
+}
+
+void capture_frame_pointers_from_context(const ucontext_t* context,
FramePointerCapture* capture) {
+ *capture = FramePointerCapture {};
+
+ uintptr_t pc = 0;
+ uintptr_t fp = 0;
+ uintptr_t sp = 0;
+ if (!read_signal_registers(context, &pc, &fp, &sp)) {
+#if defined(__x86_64__) || defined(__aarch64__)
+ capture->fp_status = FramePointerStatus::NO_CONTEXT;
+#else
+ capture->fp_status = FramePointerStatus::UNSUPPORTED_ARCH;
+#endif
+ return;
+ }
+
+ append_frame(capture, pc);
+ const MemoryRange* stack_range = find_stack_range(sp, fp);
+ if (stack_range == nullptr || fp < sp) {
+ capture->fp_status = FramePointerStatus::NO_STACK_RANGE;
+ return;
+ }
+ if (!frame_record_is_readable(*stack_range, fp)) {
+ capture->fp_status = FramePointerStatus::INVALID_FRAME_POINTER;
+ return;
+ }
+
+ capture->stack_begin = stack_range->begin;
+ capture->stack_end = stack_range->end;
+
+ uintptr_t current_fp = fp;
+ while (capture->size < capture->frame_pointers.size()) {
+ if (!frame_record_is_readable(*stack_range, current_fp)) {
+ capture->fp_status = FramePointerStatus::INVALID_FRAME_POINTER;
+ return;
+ }
+
+ const auto* frame_record = reinterpret_cast<const
uintptr_t*>(current_fp);
+ const uintptr_t next_fp = frame_record[0];
+ const uintptr_t return_address = frame_record[1];
+ append_frame(capture, return_address);
+
+ if (next_fp == 0) {
+ capture->fp_status = FramePointerStatus::END_OF_CHAIN;
+ return;
+ }
+ if (next_fp <= current_fp || !range_contains(*stack_range, next_fp) ||
+ !frame_record_is_readable(*stack_range, next_fp)) {
+ capture->fp_status = FramePointerStatus::INVALID_FRAME_POINTER;
+ return;
+ }
+ current_fp = next_fp;
+ }
+
+ capture->fp_status = FramePointerStatus::FRAME_LIMIT;
+}
+
+bool should_fallback_to_signal_context_unwind(const FramePointerCapture&
capture) {
+ return capture.fp_status != FramePointerStatus::END_OF_CHAIN &&
+ capture.fp_status != FramePointerStatus::FRAME_LIMIT;
+}
+
+void capture_signal_context_unwind(const ucontext_t* context,
FramePointerCapture* capture) {
+#if defined(USE_UNWIND) && USE_UNWIND && defined(__x86_64__)
+ StackTrace::FramePointers frame_pointers {};
+ size_t size = 0;
+
+ unw_cursor_t cursor;
+ auto* unwind_context =
reinterpret_cast<unw_context_t*>(const_cast<ucontext_t*>(context));
+ int rc = unw_init_local2(&cursor, unwind_context, UNW_INIT_SIGNAL_FRAME);
+ if (rc < 0) {
+ capture->signal_context_unwind_status =
SignalContextUnwindStatus::INIT_ERROR;
+ capture->signal_context_unwind_error = rc;
+ return;
+ }
+
+ SignalContextUnwindStatus status = SignalContextUnwindStatus::END_OF_STACK;
+ int unwind_error = 0;
+ while (size < frame_pointers.size()) {
+ unw_word_t ip = 0;
+ rc = unw_get_reg(&cursor, UNW_REG_IP, &ip);
+ if (rc < 0) {
+ status = SignalContextUnwindStatus::GET_IP_ERROR;
+ unwind_error = rc;
+ break;
+ }
+ if (ip != 0) {
+ frame_pointers[size++] = reinterpret_cast<void*>(ip);
+ }
+
+ rc = unw_step(&cursor);
+ if (rc > 0) {
+ continue;
+ }
+ if (rc == 0) {
+ status = SignalContextUnwindStatus::END_OF_STACK;
+ break;
+ }
+ status = SignalContextUnwindStatus::STEP_ERROR;
+ unwind_error = rc;
+ break;
+ }
+ if (size == frame_pointers.size()) {
+ status = SignalContextUnwindStatus::FRAME_LIMIT;
+ }
+
+ capture->signal_context_unwind_status = status;
+ capture->signal_context_unwind_error = unwind_error;
+ if (size > capture->size) {
+ capture->frame_pointers = frame_pointers;
+ capture->size = size;
+ capture->used_signal_context_unwind = true;
+ }
+#else
+ capture->signal_context_unwind_status =
SignalContextUnwindStatus::UNSUPPORTED;
+#endif
+}
+
+// SAFETY: this signal handler never symbolicates, never calls libunwind, and
never calls
+// dl_iterate_phdr or malloc. It walks frame records inside the preloaded
stack mapping and,
+// when frame-pointer walking is insufficient, copies the ucontext_t for the
coordinator thread
+// to unwind while this thread remains paused in the handler.
+void stack_trace_signal_handler(int /*sig*/, siginfo_t* info, void* context) {
+ auto saved_errno = errno;
+
+ if (info == nullptr || info->si_pid !=
g_server_pid.load(std::memory_order_acquire)) {
+ errno = saved_errno;
+ return;
+ }
+
+ const int notification_sequence = info->si_value.sival_int;
+ if (notification_sequence !=
g_active_sequence.load(std::memory_order_acquire)) {
+ errno = saved_errno;
+ return;
+ }
+
+ bool expected = false;
+ if (!g_signal_latch.compare_exchange_strong(expected, true,
std::memory_order_acquire)) {
+ errno = saved_errno;
+ return;
+ }
+
+ const auto* signal_context = reinterpret_cast<const ucontext_t*>(context);
+ capture_frame_pointers_from_context(signal_context, &g_signal_capture);
+ const bool needs_coordinator_unwind =
+ should_fallback_to_signal_context_unwind(g_signal_capture);
+ if (needs_coordinator_unwind) {
+ g_signal_context = *signal_context;
+ g_unwind_wait_sequence.store(notification_sequence,
std::memory_order_release);
+ }
+ g_data_ready_sequence.store(notification_sequence,
std::memory_order_release);
+
+ if (g_notification_pipe[1] >= 0) {
+ ssize_t res = write(g_notification_pipe[1], ¬ification_sequence,
+ sizeof(notification_sequence));
+ (void)res;
+ }
+
+ while (needs_coordinator_unwind &&
+ g_unwind_release_sequence.load(std::memory_order_acquire) !=
notification_sequence &&
+ g_active_sequence.load(std::memory_order_acquire) ==
notification_sequence) {
+#if defined(__x86_64__)
+ __builtin_ia32_pause();
+#else
+ std::atomic_signal_fence(std::memory_order_seq_cst);
+#endif
+ }
+
+ if (needs_coordinator_unwind) {
+ g_unwind_wait_sequence.store(0, std::memory_order_release);
+ }
+ g_signal_latch.store(false, std::memory_order_release);
+ errno = saved_errno;
+}
+
+void install_signal_handler() {
+ if (stack_trace_signal() <= 0) {
+ LOG(FATAL) << "SIGRTMIN+" << STACK_TRACE_SIGNAL_OFFSET << " exceeds
SIGRTMAX";
+ }
+
+ g_server_pid.store(getpid(), std::memory_order_release);
+ if (pipe2(g_notification_pipe, O_CLOEXEC | O_NONBLOCK) != 0) {
+ PLOG(FATAL) << "failed to create stack trace notification pipe";
+ }
+
+ struct sigaction action {};
+ sigemptyset(&action.sa_mask);
+ action.sa_flags = SA_SIGINFO | SA_RESTART;
+ action.sa_sigaction = stack_trace_signal_handler;
+ if (sigaction(stack_trace_signal(), &action, nullptr) != 0) {
+ PLOG(FATAL) << "failed to install BE thread stack trace signal
handler";
+ }
+}
+
+bool parse_int_param(const HttpRequest* req, std::string_view key, int
default_value, int min_value,
+ int max_value, int* value, std::string* error) {
+ const std::string& raw_value = req->param(std::string(key));
+ if (raw_value.empty()) {
+ *value = default_value;
+ return true;
+ }
+
+ char* end = nullptr;
+ errno = 0;
+ long parsed = std::strtol(raw_value.c_str(), &end, 10);
+ if (errno != 0 || end == raw_value.c_str() || *end != '\0') {
+ *error = fmt::format("invalid {}: {}", key, raw_value);
+ return false;
+ }
+ if (parsed < min_value || parsed > max_value) {
+ *error = fmt::format("invalid {}: {}, expected range [{}, {}]", key,
raw_value, min_value,
+ max_value);
+ return false;
+ }
+ *value = static_cast<int>(parsed);
+ return true;
+}
+
+bool parse_bool_param(const HttpRequest* req, std::string_view key, bool
default_value, bool* value,
+ std::string* error) {
+ const std::string& raw_value = req->param(std::string(key));
+ if (raw_value.empty()) {
+ *value = default_value;
+ return true;
+ }
+
+ std::string lower_value = raw_value;
+ std::transform(lower_value.begin(), lower_value.end(), lower_value.begin(),
+ [](unsigned char c) { return
static_cast<char>(std::tolower(c)); });
+ if (lower_value == "true" || lower_value == "1") {
+ *value = true;
+ return true;
+ }
+ if (lower_value == "false" || lower_value == "0") {
+ *value = false;
+ return true;
+ }
+
+ *error = fmt::format("invalid {}: {}, expected one of true, false, 1, 0",
key, raw_value);
+ return false;
+}
+
+bool parse_thread_id_token(std::string_view token, std::string_view
param_name, pid_t* tid,
+ std::string* error) {
+ if (token.empty()) {
+ *error = fmt::format("invalid {}: empty token", param_name);
+ return false;
+ }
+ if (!std::all_of(token.begin(), token.end(),
+ [](unsigned char c) { return std::isdigit(c) != 0; })) {
+ *error = fmt::format("invalid {}: {}", param_name, token);
+ return false;
+ }
+
+ std::string token_copy(token);
+ char* end = nullptr;
+ errno = 0;
+ long parsed = std::strtol(token_copy.c_str(), &end, 10);
+ if (errno != 0 || end == token_copy.c_str() || *end != '\0' || parsed <= 0
||
+ parsed > std::numeric_limits<pid_t>::max()) {
+ *error = fmt::format("invalid {}: {}", param_name, token);
+ return false;
+ }
+
+ *tid = static_cast<pid_t>(parsed);
+ return true;
+}
+
+std::optional<std::vector<pid_t>> parse_thread_id_filter(const HttpRequest*
req,
+ std::string* error) {
+ const std::string& legacy_tid = req->param("tid");
+ const std::string& thread_id = req->param("thread_id");
+ if (!legacy_tid.empty() && !thread_id.empty()) {
+ *error = "tid and thread_id are mutually exclusive";
+ return std::nullopt;
+ }
+
+ const bool use_thread_id = !thread_id.empty();
+ const std::string& raw = use_thread_id ? thread_id : legacy_tid;
+ if (raw.empty()) {
+ return std::nullopt;
+ }
+
+ std::vector<pid_t> tids;
+ const std::string_view param_name = use_thread_id ? "thread_id" : "tid";
+ size_t token_begin = 0;
+ while (token_begin <= raw.size()) {
+ const size_t comma = raw.find(',', token_begin);
+ const size_t token_end = comma == std::string::npos ? raw.size() :
comma;
+ pid_t tid = 0;
+ if (!parse_thread_id_token(
+ std::string_view(raw).substr(token_begin, token_end -
token_begin), param_name,
+ &tid, error)) {
+ return std::nullopt;
+ }
+ tids.push_back(tid);
+ if (comma == std::string::npos) {
+ break;
+ }
+ token_begin = comma + 1;
+ }
+
+ return tids;
+}
+
+bool parse_dwarf_mode(const HttpRequest* req, std::string* mode, std::string*
error) {
+ *mode = req->param("dwarf_location_info_mode");
+ if (mode->empty()) {
+ *mode = req->param("mode");
+ }
+ if (mode->empty()) {
+ *mode = std::string(DEFAULT_DWARF_MODE);
+ return true;
+ }
+
+ std::string lower_mode = *mode;
+ std::transform(lower_mode.begin(), lower_mode.end(), lower_mode.begin(),
+ [](unsigned char c) { return
static_cast<char>(std::tolower(c)); });
+ if (lower_mode == "disabled" || lower_mode == "fast" || lower_mode ==
"full" ||
+ lower_mode == "full_with_inline") {
+ *mode = lower_mode;
+ return true;
+ }
+
+ *error = fmt::format(
+ "invalid dwarf_location_info_mode: {}, expected one of DISABLED,
FAST, "
+ "FULL, FULL_WITH_INLINE",
+ *mode);
+ return false;
+}
+
+std::string read_thread_name(pid_t tid) {
+ std::ifstream comm(fmt::format("/proc/self/task/{}/comm", tid));
+ if (!comm.is_open()) {
+ return "?";
+ }
+ std::string name;
+ std::getline(comm, name);
+ if (name.empty()) {
+ return "?";
+ }
+ return name;
+}
+
+std::vector<ThreadInfo> list_threads(const std::optional<std::vector<pid_t>>&
tid_filter) {
+ std::vector<ThreadInfo> threads;
+
+ if (tid_filter.has_value()) {
+ for (const pid_t tid : *tid_filter) {
+ threads.push_back({tid, read_thread_name(tid)});
+ }
+ return threads;
+ }
+
+ std::error_code ec;
+ for (const auto& entry :
std::filesystem::directory_iterator("/proc/self/task", ec)) {
+ if (ec) {
+ break;
+ }
+ const auto filename = entry.path().filename().string();
+ char* end = nullptr;
+ errno = 0;
+ long tid = std::strtol(filename.c_str(), &end, 10);
+ if (errno != 0 || end == filename.c_str() || *end != '\0' || tid <= 0)
{
+ continue;
+ }
+ threads.push_back({static_cast<pid_t>(tid),
read_thread_name(static_cast<pid_t>(tid))});
+ }
+
+ std::sort(threads.begin(), threads.end(),
+ [](const ThreadInfo& lhs, const ThreadInfo& rhs) { return
lhs.tid < rhs.tid; });
+ return threads;
+}
+
+bool parse_hex_u64(std::string_view value, uint64_t* result) {
+ std::string copy(value);
+ char* end = nullptr;
+ errno = 0;
+ unsigned long long parsed = std::strtoull(copy.c_str(), &end, 16);
+ if (errno != 0 || end == copy.c_str()) {
+ return false;
+ }
+ *result = static_cast<uint64_t>(parsed);
+ return true;
+}
+
+bool parse_maps_range(std::string_view value, MemoryRange* range) {
+ const size_t dash = value.find('-');
+ if (dash == std::string_view::npos) {
+ return false;
+ }
+
+ uint64_t begin = 0;
+ uint64_t end = 0;
+ if (!parse_hex_u64(value.substr(0, dash), &begin) ||
+ !parse_hex_u64(value.substr(dash + 1), &end) || begin >= end) {
+ return false;
+ }
+
+ range->begin = static_cast<uintptr_t>(begin);
+ range->end = static_cast<uintptr_t>(end);
+ return true;
+}
+
+bool wait_for_signal_handler_idle(int timeout_ms) {
+ const auto deadline = std::chrono::steady_clock::now() +
std::chrono::milliseconds(timeout_ms);
+ while (g_signal_latch.load(std::memory_order_acquire)) {
+ if (std::chrono::steady_clock::now() >= deadline) {
+ return false;
+ }
+ std::this_thread::yield();
+ }
+ return true;
+}
+
+bool release_signal_handler_and_wait(int sequence, int timeout_ms) {
+ // A published stack only means the data slot is ready. The target thread
may still be inside
+ // the handler waiting for fallback unwind release; starting the next TID
before the latch drops
+ // can make that next handler return without publishing anything.
+ g_unwind_release_sequence.store(sequence, std::memory_order_release);
+ g_active_sequence.store(0, std::memory_order_release);
+ return wait_for_signal_handler_idle(timeout_ms);
+}
+
+bool load_readable_writable_mappings(int timeout_ms, std::string* error) {
+ // Stack bounds are read before any signal is sent because the handler
must not open /proc,
+ // allocate, or take locks while the target thread is asynchronously
interrupted.
+ g_active_sequence.store(0, std::memory_order_release);
+ if (!wait_for_signal_handler_idle(timeout_ms)) {
+ *error = "previous stack trace signal handler is still running";
+ return false;
+ }
+ g_memory_range_count = 0;
+
+ std::ifstream maps("/proc/self/maps");
+ if (!maps.is_open()) {
+ *error = fmt::format("failed to open /proc/self/maps: {}",
std::strerror(errno));
+ return false;
+ }
+
+ std::string line;
+ while (std::getline(maps, line)) {
+ std::istringstream iss(line);
+ std::string address_range;
+ std::string permissions;
+ if (!(iss >> address_range >> permissions)) {
+ continue;
+ }
+ if (permissions.size() < 2 || permissions[0] != 'r' || permissions[1]
!= 'w') {
+ continue;
+ }
+
+ MemoryRange range;
+ if (!parse_maps_range(address_range, &range)) {
+ continue;
+ }
+ if (g_memory_range_count >= g_memory_ranges.size()) {
+ *error = fmt::format("too many readable writable mappings,
max={}", MAX_MEMORY_RANGES);
+ g_memory_range_count = 0;
+ return false;
+ }
+ g_memory_ranges[g_memory_range_count++] = range;
+ }
+
+ return true;
+}
+
+bool is_signal_blocked(pid_t tid) {
+ // If the target masks the diagnostic signal, the kernel will not run our
handler for that TID.
+ // Detecting it up front turns an otherwise guaranteed timeout into an
explicit output status.
+ std::ifstream status(fmt::format("/proc/self/task/{}/status", tid));
+ if (!status.is_open()) {
+ return false;
+ }
+
+ std::string line;
+ while (std::getline(status, line)) {
+ constexpr std::string_view prefix = "SigBlk:";
+ if (!line.starts_with(prefix)) {
+ continue;
+ }
+
+ uint64_t blocked_mask = 0;
+ if (!parse_hex_u64(std::string_view(line).substr(prefix.size()),
&blocked_mask)) {
+ return false;
+ }
+ const int signal = stack_trace_signal();
+ if (signal <= 0 || signal > 64) {
+ return false;
+ }
+ return (blocked_mask & (uint64_t {1} << (signal - 1))) != 0;
+ }
+ return false;
+}
+
+bool parse_long_token(std::string_view token, long* result) {
+ if (token.empty()) {
+ return false;
+ }
+ std::string copy(token);
+ char* end = nullptr;
+ errno = 0;
+ long parsed = std::strtol(copy.c_str(), &end, 10);
+ if (errno != 0 || end == copy.c_str() || *end != '\0') {
+ return false;
+ }
+ *result = parsed;
+ return true;
+}
+
+std::string syscall_name(long number) {
+ switch (number) {
+#ifdef SYS_read
+ case SYS_read:
+ return "read";
+#endif
+#ifdef SYS_pread64
+ case SYS_pread64:
+ return "pread64";
+#endif
+#ifdef SYS_recvfrom
+ case SYS_recvfrom:
+ return "recvfrom";
+#endif
+#ifdef SYS_recvmsg
+ case SYS_recvmsg:
+ return "recvmsg";
+#endif
+#ifdef SYS_accept
+ case SYS_accept:
+ return "accept";
+#endif
+#ifdef SYS_accept4
+ case SYS_accept4:
+ return "accept4";
+#endif
+#ifdef SYS_poll
+ case SYS_poll:
+ return "poll";
+#endif
+#ifdef SYS_ppoll
+ case SYS_ppoll:
+ return "ppoll";
+#endif
+#ifdef SYS_select
+ case SYS_select:
+ return "select";
+#endif
+#ifdef SYS_pselect6
+ case SYS_pselect6:
+ return "pselect6";
+#endif
+#ifdef SYS_epoll_wait
+ case SYS_epoll_wait:
+ return "epoll_wait";
+#endif
+#ifdef SYS_epoll_pwait
+ case SYS_epoll_pwait:
+ return "epoll_pwait";
+#endif
+#ifdef SYS_epoll_pwait2
+ case SYS_epoll_pwait2:
+ return "epoll_pwait2";
+#endif
+#ifdef SYS_futex
+ case SYS_futex:
+ return "futex";
+#endif
+#ifdef SYS_nanosleep
+ case SYS_nanosleep:
+ return "nanosleep";
+#endif
+#ifdef SYS_clock_nanosleep
+ case SYS_clock_nanosleep:
+ return "clock_nanosleep";
+#endif
+ default:
+ return fmt::format("syscall_{}", number);
+ }
+}
+
+bool is_interrupt_sensitive_syscall(long number) {
+ // This list is only used by the explicit conservative mode. The default
path still samples
+ // these threads so operators do not lose most blocked-worker stacks
during real incidents.
+ switch (number) {
+#ifdef SYS_read
+ case SYS_read:
+#endif
+#ifdef SYS_pread64
+ case SYS_pread64:
+#endif
+#ifdef SYS_recvfrom
+ case SYS_recvfrom:
+#endif
+#ifdef SYS_recvmsg
+ case SYS_recvmsg:
+#endif
+#ifdef SYS_accept
+ case SYS_accept:
+#endif
+#ifdef SYS_accept4
+ case SYS_accept4:
+#endif
+#ifdef SYS_poll
+ case SYS_poll:
+#endif
+#ifdef SYS_ppoll
+ case SYS_ppoll:
+#endif
+#ifdef SYS_select
+ case SYS_select:
+#endif
+#ifdef SYS_pselect6
+ case SYS_pselect6:
+#endif
+#ifdef SYS_epoll_wait
+ case SYS_epoll_wait:
+#endif
+#ifdef SYS_epoll_pwait
+ case SYS_epoll_pwait:
+#endif
+#ifdef SYS_epoll_pwait2
+ case SYS_epoll_pwait2:
+#endif
+#ifdef SYS_futex
+ case SYS_futex:
+#endif
+#ifdef SYS_nanosleep
+ case SYS_nanosleep:
+#endif
+#ifdef SYS_clock_nanosleep
+ case SYS_clock_nanosleep:
+#endif
+ return true;
+ default:
+ return false;
+ }
+}
+
+std::optional<ThreadSyscall> current_interrupt_sensitive_syscall(pid_t tid) {
+ std::ifstream syscall_file(fmt::format("/proc/self/task/{}/syscall", tid));
+ if (!syscall_file.is_open()) {
+ return std::nullopt;
+ }
+
+ std::string token;
+ syscall_file >> token;
+ if (token.empty() || token == "running") {
+ return std::nullopt;
+ }
+
+ long number = -1;
+ if (!parse_long_token(token, &number) ||
!is_interrupt_sensitive_syscall(number)) {
+ return std::nullopt;
+ }
+ return ThreadSyscall {.number = number, .name = syscall_name(number)};
+}
+
+std::string fp_status_to_string(FramePointerStatus status) {
+ switch (status) {
+ case FramePointerStatus::END_OF_CHAIN:
+ return "end_of_chain";
+ case FramePointerStatus::NO_CONTEXT:
+ return "no_context";
+ case FramePointerStatus::UNSUPPORTED_ARCH:
+ return "unsupported_arch";
+ case FramePointerStatus::NO_STACK_RANGE:
+ return "no_stack_range";
+ case FramePointerStatus::INVALID_FRAME_POINTER:
+ return "invalid_frame_pointer";
+ case FramePointerStatus::FRAME_LIMIT:
+ return "frame_limit";
+ }
+ return "unknown";
+}
+
+std::string signal_context_unwind_status_to_string(SignalContextUnwindStatus
status) {
+ switch (status) {
+ case SignalContextUnwindStatus::NOT_ATTEMPTED:
+ return "not_attempted";
+ case SignalContextUnwindStatus::END_OF_STACK:
+ return "end_of_stack";
+ case SignalContextUnwindStatus::INIT_ERROR:
+ return "init_error";
+ case SignalContextUnwindStatus::GET_IP_ERROR:
+ return "get_ip_error";
+ case SignalContextUnwindStatus::STEP_ERROR:
+ return "step_error";
+ case SignalContextUnwindStatus::FRAME_LIMIT:
+ return "frame_limit";
+ case SignalContextUnwindStatus::UNSUPPORTED:
+ return "unsupported";
+ }
+ return "unknown";
+}
+
+std::string describe_frame_pointer_capture(const FramePointerCapture& capture)
{
+ std::stringstream out;
+ out << "capture_method="
+ << (capture.used_signal_context_unwind ? "signal_context_libunwind" :
"frame_pointer");
+ out << " frames=" << capture.size;
+ out << " fp_status=" << fp_status_to_string(capture.fp_status);
+ if (capture.signal_context_unwind_status !=
SignalContextUnwindStatus::NOT_ATTEMPTED) {
+ out << " unwind_status="
+ <<
signal_context_unwind_status_to_string(capture.signal_context_unwind_status);
+ if (capture.signal_context_unwind_error != 0) {
+ out << " unwind_error=" << capture.signal_context_unwind_error;
+ }
+ }
+ if (capture.stack_begin != 0 || capture.stack_end != 0) {
+ out << fmt::format(" stack_bounds=0x{:x}-0x{:x}", capture.stack_begin,
capture.stack_end);
+ }
+ return out.str();
+}
+
+bool wait_for_stack_trace(int sequence, int timeout_ms) {
+ const auto deadline = std::chrono::steady_clock::now() +
std::chrono::milliseconds(timeout_ms);
+
+ while (true) {
+ if (g_data_ready_sequence.load(std::memory_order_acquire) == sequence)
{
+ return true;
+ }
+
+ int remaining_ms =
static_cast<int>(std::chrono::duration_cast<std::chrono::milliseconds>(
+ deadline -
std::chrono::steady_clock::now())
+ .count());
+ if (remaining_ms < 0) {
+ return false;
+ }
+
+ pollfd poll_fd {g_notification_pipe[0], POLLIN, 0};
+ int poll_res = poll(&poll_fd, 1, remaining_ms);
+ if (poll_res < 0) {
+ if (errno == EINTR) {
+ continue;
+ }
+ return false;
+ }
+ if (poll_res == 0) {
+ return false;
+ }
+
+ while (true) {
+ int notification_sequence = 0;
+ ssize_t read_res = read(g_notification_pipe[0],
¬ification_sequence,
+ sizeof(notification_sequence));
+ if (read_res < 0) {
+ if (errno == EINTR) {
+ continue;
+ }
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ break;
+ }
+ return false;
+ }
+ if (read_res != sizeof(notification_sequence)) {
+ return false;
+ }
+ if (notification_sequence == sequence &&
+ g_data_ready_sequence.load(std::memory_order_acquire) ==
sequence) {
+ return true;
+ }
+ }
+ }
+}
+
+std::string symbolize_stack_trace(const FramePointerCapture& capture,
+ const std::string& dwarf_mode) {
+ StackTrace::FramePointers frame_pointers = capture.frame_pointers;
+ return StackTrace::toString(frame_pointers.data(), 0, capture.size,
dwarf_mode);
+}
+
+std::string capture_current_thread_stack(const std::string& dwarf_mode) {
+ return StackTrace().toString(-3, dwarf_mode);
+}
+
+enum class CaptureStatus {
+ OK,
+ CURRENT_THREAD,
+ SKIPPED_BLOCKING_SYSCALL,
+ SIGNAL_BLOCKED,
+ THREAD_EXITED,
+ SIGNAL_ERROR,
+ TIMEOUT,
+};
+
+struct CaptureResult {
+ CaptureStatus status = CaptureStatus::TIMEOUT;
+ std::string stack;
+ std::string error;
+ std::string diagnostic;
+};
+
+CaptureResult capture_thread_stack(pid_t tid, const std::string& dwarf_mode,
int timeout_ms,
+ bool skip_blocking_syscalls) {
+ if (tid == get_current_tid()) {
+ return {CaptureStatus::CURRENT_THREAD,
capture_current_thread_stack(dwarf_mode), "",
+ "capture_method=current_thread_stacktrace"};
+ }
+
+ if (skip_blocking_syscalls) {
+ if (auto syscall = current_interrupt_sensitive_syscall(tid)) {
+ return {CaptureStatus::SKIPPED_BLOCKING_SYSCALL, "", "",
+ fmt::format("syscall={} syscall_number={}", syscall->name,
syscall->number)};
+ }
+ }
+
+ if (is_signal_blocked(tid)) {
+ return {CaptureStatus::SIGNAL_BLOCKED, "", "", ""};
+ }
+
+ // The handler publishes through process-global state, not per-thread
storage. Waiting here is
+ // the guardrail that keeps a previous slow-to-exit handler from causing
this TID's signal to be
+ // dropped by the latch CAS.
+ if (!wait_for_signal_handler_idle(timeout_ms)) {
+ return {CaptureStatus::TIMEOUT, "", "",
"previous_signal_handler_still_running"};
+ }
+
+ int sequence = g_sequence.fetch_add(1, std::memory_order_acq_rel) + 1;
+ g_unwind_release_sequence.store(0, std::memory_order_release);
+ g_unwind_wait_sequence.store(0, std::memory_order_release);
+ g_active_sequence.store(sequence, std::memory_order_release);
+ siginfo_t signal_info {};
+ signal_info.si_code = SI_QUEUE;
+ signal_info.si_pid = g_server_pid.load(std::memory_order_acquire);
+ signal_info.si_uid = getuid();
+ signal_info.si_value.sival_int = sequence;
+
+ if (rt_tgsigqueueinfo(g_server_pid.load(std::memory_order_acquire), tid,
stack_trace_signal(),
+ &signal_info) != 0) {
+ g_active_sequence.store(0, std::memory_order_release);
+ if (errno == ESRCH) {
+ return {CaptureStatus::THREAD_EXITED, "", "", ""};
+ }
+ return {CaptureStatus::SIGNAL_ERROR, "", std::strerror(errno), ""};
+ }
+
+ if (!wait_for_stack_trace(sequence, timeout_ms)) {
+ const bool handler_idle = release_signal_handler_and_wait(sequence,
timeout_ms);
+ return {CaptureStatus::TIMEOUT, "", "",
+ handler_idle ? "" : "signal_handler_release_timeout"};
+ }
+
+ FramePointerCapture capture = g_signal_capture;
+ if (should_fallback_to_signal_context_unwind(capture) &&
+ g_unwind_wait_sequence.load(std::memory_order_acquire) == sequence) {
+ capture_signal_context_unwind(&g_signal_context, &capture);
+ }
+ if (!release_signal_handler_and_wait(sequence, timeout_ms)) {
+ // Returning the captured stack while the target is still in the
handler would hide a much
+ // more serious diagnostic side effect. Treat it as a timeout so the
summary reflects the
+ // release failure explicitly.
+ return {CaptureStatus::TIMEOUT, "", "",
"signal_handler_release_timeout"};
+ }
+
+ return {CaptureStatus::OK, symbolize_stack_trace(capture, dwarf_mode), "",
+ describe_frame_pointer_capture(capture)};
+}
+
+std::string status_to_string(CaptureStatus status) {
+ switch (status) {
+ case CaptureStatus::OK:
+ return "ok";
+ case CaptureStatus::CURRENT_THREAD:
+ return "ok_current_thread";
+ case CaptureStatus::SKIPPED_BLOCKING_SYSCALL:
+ return "skipped_blocking_syscall";
+ case CaptureStatus::SIGNAL_BLOCKED:
+ return "signal_blocked";
+ case CaptureStatus::THREAD_EXITED:
+ return "thread_exited";
+ case CaptureStatus::SIGNAL_ERROR:
+ return "signal_error";
+ case CaptureStatus::TIMEOUT:
+ return "timeout";
+ }
+ return "unknown";
+}
+
+void append_thread_result(std::stringstream& out, const ThreadInfo& thread,
+ const CaptureResult& result) {
+ out << "----- thread " << thread.tid << " (" << thread.name
+ << ") status=" << status_to_string(result.status);
+ if (!result.diagnostic.empty()) {
+ out << ' ' << result.diagnostic;
+ }
+ if (!result.error.empty()) {
+ out << " error=\"" << result.error << "\"";
+ }
+ out << " -----\n";
+
+ if (result.stack.empty()) {
+ out << "<no stack captured>\n\n";
+ return;
+ }
+ out << result.stack;
+ if (!result.stack.ends_with('\n')) {
+ out << '\n';
+ }
+ out << '\n';
+}
+
+#endif // __linux__
+
+} // namespace
+
+void BeThreadStackAction::handle(HttpRequest* req) {
+ req->add_output_header(HttpHeaders::CONTENT_TYPE, HEADER_TEXT.data());
+
+#ifndef __linux__
+ HttpChannel::send_reply(req, HttpStatus::NOT_IMPLEMENTED,
+ "BE thread stack trace is only supported on
Linux.\n");
+#else
+ std::call_once(g_install_signal_once, install_signal_handler);
+
+ int timeout_ms = DEFAULT_TIMEOUT_MS;
+ std::string error;
+ if (!parse_int_param(req, "timeout_ms", DEFAULT_TIMEOUT_MS, 1,
MAX_TIMEOUT_MS, &timeout_ms,
+ &error)) {
+ HttpChannel::send_reply(req, HttpStatus::BAD_REQUEST, error + "\n");
+ return;
+ }
+
+ bool skip_blocking_syscalls = false;
+ if (!parse_bool_param(req, "skip_blocking_syscalls", false,
&skip_blocking_syscalls, &error)) {
+ HttpChannel::send_reply(req, HttpStatus::BAD_REQUEST, error + "\n");
+ return;
+ }
+
+ std::optional<std::vector<pid_t>> tid_filter = parse_thread_id_filter(req,
&error);
+ if (!error.empty()) {
+ HttpChannel::send_reply(req, HttpStatus::BAD_REQUEST, error + "\n");
+ return;
+ }
+
+ std::string dwarf_mode;
+ if (!parse_dwarf_mode(req, &dwarf_mode, &error)) {
+ HttpChannel::send_reply(req, HttpStatus::BAD_REQUEST, error + "\n");
+ return;
+ }
+
+ // The signal handler state is process-global and intentionally
single-slot, so concurrent HTTP
+ // requests would corrupt capture ownership rather than merely interleave
response text.
+ std::unique_lock<std::mutex> lock(g_collect_mutex, std::try_to_lock);
+ if (!lock.owns_lock()) {
+ HttpChannel::send_reply(req, HttpStatus::CONFLICT,
+ "another BE thread stack trace request is
running\n");
+ return;
+ }
+
+ auto threads = list_threads(tid_filter);
+ if (!load_readable_writable_mappings(timeout_ms, &error)) {
+ HttpChannel::send_reply(req, HttpStatus::INTERNAL_SERVER_ERROR, error
+ "\n");
+ return;
+ }
+
+ std::stringstream out;
+ out << "BE thread stack traces\n";
+ out << "pid: " << g_server_pid.load(std::memory_order_acquire) << '\n';
+ out << "service_signal: " << stack_trace_signal() << '\n';
+ out << "thread_count: " << threads.size() << '\n';
+ out << "timeout_ms_per_thread: " << timeout_ms << '\n';
+ out << "dwarf_location_info_mode: " << dwarf_mode << '\n';
+ out << "skip_blocking_syscalls: " << (skip_blocking_syscalls ? "true" :
"false") << '\n';
+ out << "signal_handler_unwinder: "
+
"frame_pointer_with_coordinator_signal_context_libunwind_fallback\n\n";
+
+ int captured = 0;
+ int skipped = 0;
+ int timed_out = 0;
+ int remote_signal_attempts = 0;
+
+ for (const auto& thread : threads) {
+ CaptureResult result =
+ capture_thread_stack(thread.tid, dwarf_mode, timeout_ms,
skip_blocking_syscalls);
+ switch (result.status) {
+ case CaptureStatus::OK:
+ ++remote_signal_attempts;
+ ++captured;
+ break;
+ case CaptureStatus::CURRENT_THREAD:
+ ++captured;
+ break;
+ case CaptureStatus::TIMEOUT:
+ ++remote_signal_attempts;
+ ++timed_out;
+ break;
+ case CaptureStatus::SIGNAL_ERROR:
+ ++remote_signal_attempts;
+ ++skipped;
+ break;
+ case CaptureStatus::SKIPPED_BLOCKING_SYSCALL:
+ case CaptureStatus::SIGNAL_BLOCKED:
+ case CaptureStatus::THREAD_EXITED:
+ ++skipped;
+ break;
+ }
+ append_thread_result(out, thread, result);
+ }
+
+ out << "summary: captured=" << captured << " skipped=" << skipped << "
timed_out=" << timed_out
+ << " remote_signal_attempts=" << remote_signal_attempts << '\n';
+ HttpChannel::send_reply(req, HttpStatus::OK, out.str());
+#endif
+}
+
+} // namespace doris
diff --git a/be/src/service/http/action/be_thread_stack_action.h
b/be/src/service/http/action/be_thread_stack_action.h
new file mode 100644
index 00000000000..24761da22d0
--- /dev/null
+++ b/be/src/service/http/action/be_thread_stack_action.h
@@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "service/http/http_handler_with_auth.h"
+
+namespace doris {
+
+class ExecEnv;
+class HttpRequest;
+
+class BeThreadStackAction : public HttpHandlerWithAuth {
+public:
+ BeThreadStackAction(ExecEnv* exec_env) : HttpHandlerWithAuth(exec_env) {}
+ ~BeThreadStackAction() override = default;
+
+ // GET
/api/stack_trace[?thread_id=<tid>[,<tid>...]][&tid=<tid>][&timeout_ms=<ms>]
+ // [&mode=<DISABLED|FAST|FULL|FULL_WITH_INLINE>]
+ //
[&dwarf_location_info_mode=<DISABLED|FAST|FULL|FULL_WITH_INLINE>]
+ // [&skip_blocking_syscalls=<true|false>]
+ // thread_id is the preferred selector; tid is kept as a legacy alias. The
default is full
+ // process collection without a signal-attempt cap, because production
incidents usually need
+ // blocked worker stacks as much as running stacks. skip_blocking_syscalls
is an explicit
+ // conservative mode for isolating EINTR-sensitive paths, not the normal
capture policy.
+ void handle(HttpRequest* req) override;
+};
+
+} // namespace doris
diff --git a/be/src/service/http_service.cpp b/be/src/service/http_service.cpp
index 5fc4390eb7e..2172c0ebbd7 100644
--- a/be/src/service/http_service.cpp
+++ b/be/src/service/http_service.cpp
@@ -35,6 +35,7 @@
#include "service/http/action/adjust_log_level.h"
#include "service/http/action/batch_download_action.h"
#include "service/http/action/be_proc_thread_action.h"
+#include "service/http/action/be_thread_stack_action.h"
#include "service/http/action/calc_file_crc_action.h"
#include "service/http/action/check_encryption_action.h"
#include "service/http/action/check_rpc_channel_action.h"
@@ -196,6 +197,10 @@ Status HttpService::start() {
_ev_http_server->register_handler(HttpMethod::GET,
"/api/be_process_thread_num",
be_proc_thread_action);
+ // Dump C++ stack traces for current BE threads.
+ BeThreadStackAction* be_thread_stack_action = _pool.add(new
BeThreadStackAction(_env));
+ _ev_http_server->register_handler(HttpMethod::GET, "/api/stack_trace",
be_thread_stack_action);
+
// Register BE LoadStream action
LoadStreamAction* load_stream_action = _pool.add(new
LoadStreamAction(_env));
_ev_http_server->register_handler(HttpMethod::GET, "/api/load_streams",
load_stream_action);
diff --git a/be/src/util/thrift_server.cpp b/be/src/util/thrift_server.cpp
index d5b72ddfdcb..b89d95adb22 100644
--- a/be/src/util/thrift_server.cpp
+++ b/be/src/util/thrift_server.cpp
@@ -31,13 +31,16 @@
#include <thrift/transport/TServerSocket.h>
#include <thrift/transport/TSocket.h>
#include <thrift/transport/TTransport.h>
+#include <thrift/transport/TTransportException.h>
// IWYU pragma: no_include <bits/chrono.h>
#include <chrono> // IWYU pragma: keep
#include <condition_variable>
#include <memory>
#include <mutex>
#include <sstream>
+#include <string_view>
#include <thread>
+#include <utility>
#include "common/config.h"
#include "common/metrics/doris_metrics.h"
@@ -60,6 +63,84 @@
DEFINE_GAUGE_METRIC_PROTOTYPE_3ARG(thrift_current_connections, MetricUnit::CONNE
DEFINE_COUNTER_METRIC_PROTOTYPE_3ARG(thrift_connections_total,
MetricUnit::CONNECTIONS,
"Total connections made over the lifetime
of this server");
+bool is_unknown_eintr(const apache::thrift::transport::TTransportException& e)
{
+ return e.getType() ==
apache::thrift::transport::TTransportException::UNKNOWN &&
+ std::string_view(e.what()).find("Interrupted system call") !=
std::string_view::npos;
+}
+
+// Full-process stack collection sends a diagnostic signal to many BE threads.
Even with
+// SA_RESTART, Linux can still return EINTR from poll/select/epoll-style
waits, and Thrift wraps
+// that as an UNKNOWN TTransportException with "Interrupted system call". If
this exception escapes
+// the blocking thrift server loop, heartbeat/backend thrift services can exit
while the BE process
+// is still partly alive, making FE report the BE as Alive=false. This wrapper
keeps the normal
+// TServerSocket behavior but retries only that narrow EINTR case on
accept/read/peek; all other
+// transport errors are still propagated.
+class ImprovedServerSocket final : public
apache::thrift::transport::TServerSocket {
+ using TConfiguration = apache::thrift::TConfiguration;
+ using TServerSocket = apache::thrift::transport::TServerSocket;
+ using TSocket = apache::thrift::transport::TSocket;
+ using TTransport = apache::thrift::transport::TTransport;
+ using TTransportException = apache::thrift::transport::TTransportException;
+
+ class RetryingSocket final : public TSocket {
+ public:
+ RetryingSocket(THRIFT_SOCKET socket, std::shared_ptr<THRIFT_SOCKET>
interrupt_listener,
+ std::shared_ptr<TConfiguration> config)
+ : TSocket(socket, std::move(interrupt_listener),
std::move(config)) {}
+
+ uint32_t read(uint8_t* buf, uint32_t len) override {
+ while (true) {
+ try {
+ return TSocket::read(buf, len);
+ } catch (const TTransportException& e) {
+ if (!is_unknown_eintr(e)) {
+ throw;
+ }
+ }
+ }
+ }
+
+ bool peek() override {
+ while (true) {
+ try {
+ return TSocket::peek();
+ } catch (const TTransportException& e) {
+ if (!is_unknown_eintr(e)) {
+ throw;
+ }
+ }
+ }
+ }
+ };
+
+public:
+ ImprovedServerSocket(const std::string& address, int port)
+ : TServerSocket(address, port),
+
_config(std::make_shared<TConfiguration>(config::thrift_max_message_size)) {}
+ ~ImprovedServerSocket() override = default;
+
+protected:
+ std::shared_ptr<TTransport> acceptImpl() override {
+ while (true) {
+ try {
+ return TServerSocket::acceptImpl();
+ } catch (const TTransportException& e) {
+ if (!is_unknown_eintr(e)) {
+ throw;
+ }
+ }
+ }
+ }
+
+ std::shared_ptr<TSocket> createSocket(THRIFT_SOCKET client) override {
+ return std::make_shared<RetryingSocket>(
+ client, interruptableChildren_ ? pChildInterruptSockReader_ :
nullptr, _config);
+ }
+
+private:
+ std::shared_ptr<TConfiguration> _config;
+};
+
// Nonblocking Server socket implementation of TNonblockingServerTransport.
Wrapper around a unix
// socket listen and accept calls.
class ImprovedNonblockingServerSocket : public
apache::thrift::transport::TNonblockingServerSocket {
@@ -381,7 +462,7 @@ Status ThriftServer::start() {
break;
case THREAD_POOL:
- fe_server_transport.reset(new apache::thrift::transport::TServerSocket(
+ fe_server_transport.reset(new ImprovedServerSocket(
BackendOptions::get_service_bind_address_without_bracket(),
_port));
if (transport_factory == nullptr) {
@@ -393,7 +474,7 @@ Status ThriftServer::start() {
break;
case THREADED:
- server_socket = new apache::thrift::transport::TServerSocket(
+ server_socket = new ImprovedServerSocket(
BackendOptions::get_service_bind_address_without_bracket(),
_port);
fe_server_transport.reset(server_socket);
server_socket->setKeepAlive(true);
diff --git a/be/test/service/http/be_thread_stack_action_test.cpp
b/be/test/service/http/be_thread_stack_action_test.cpp
new file mode 100644
index 00000000000..5c5a5d46771
--- /dev/null
+++ b/be/test/service/http/be_thread_stack_action_test.cpp
@@ -0,0 +1,437 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "service/http/action/be_thread_stack_action.h"
+
+#include <gmock/gmock-matchers.h>
+#include <gtest/gtest.h>
+
+#ifdef __linux__
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#include <array>
+#include <atomic>
+#include <cerrno>
+#include <chrono>
+#include <condition_variable>
+#include <cstdint>
+#include <cstring>
+#include <fstream>
+#include <mutex>
+#include <string>
+#include <thread>
+#include <vector>
+#endif
+
+#include "common/config.h"
+#include "common/stack_trace.h"
+#include "service/http/ev_http_server.h"
+#include "service/http/http_client.h"
+#include "service/http/http_method.h"
+
+namespace doris {
+
+#ifdef __linux__
+namespace {
+
+class ParkedMarkerThread {
+public:
+ void start() {
+ _thread = std::thread([this] { run(); });
+ std::unique_lock<std::mutex> lock(_mu);
+ _ready_cv.wait(lock, [this] { return _tid.load() != 0; });
+ }
+
+ void stop() {
+ _stop.store(true);
+ if (_thread.joinable()) {
+ _thread.join();
+ }
+ }
+
+ ~ParkedMarkerThread() { stop(); }
+
+ pid_t tid() const { return _tid.load(); }
+
+private:
+ __attribute__((noinline)) void run() {
+ _tid.store(static_cast<pid_t>(::syscall(SYS_gettid)));
+ {
+ std::lock_guard<std::mutex> lock(_mu);
+ _ready_cv.notify_all();
+ }
+ spin_until_stopped();
+ }
+
+ __attribute__((noinline)) void spin_until_stopped() {
+ while (!_stop.load()) {
+ std::atomic_signal_fence(std::memory_order_seq_cst);
+ }
+ }
+
+ std::thread _thread;
+ std::atomic<pid_t> _tid {0};
+ std::atomic<bool> _stop {false};
+ std::mutex _mu;
+ std::condition_variable _ready_cv;
+};
+
+class BlockingReadThread {
+public:
+ bool start() {
+ if (::pipe(_pipe_fds.data()) != 0) {
+ return false;
+ }
+ _thread = std::thread([this] { run(); });
+ std::unique_lock<std::mutex> lock(_mu);
+ return _ready_cv.wait_for(lock, std::chrono::seconds(5),
+ [this] { return _tid.load() != 0; });
+ }
+
+ void stop() {
+ if (_pipe_fds[1] >= 0) {
+ ::close(_pipe_fds[1]);
+ _pipe_fds[1] = -1;
+ }
+ if (_thread.joinable()) {
+ _thread.join();
+ }
+ if (_pipe_fds[0] >= 0) {
+ ::close(_pipe_fds[0]);
+ _pipe_fds[0] = -1;
+ }
+ }
+
+ ~BlockingReadThread() { stop(); }
+
+ pid_t tid() const { return _tid.load(); }
+ bool read_finished() const { return _read_finished.load(); }
+ int read_errno() const { return _read_errno.load(); }
+
+private:
+ void run() {
+ _tid.store(static_cast<pid_t>(::syscall(SYS_gettid)));
+ {
+ std::lock_guard<std::mutex> lock(_mu);
+ _ready_cv.notify_all();
+ }
+ char byte = 0;
+ ssize_t res = ::read(_pipe_fds[0], &byte, 1);
+ _read_errno.store(res < 0 ? errno : 0);
+ _read_finished.store(true);
+ (void)res;
+ }
+
+ std::array<int, 2> _pipe_fds {-1, -1};
+ std::thread _thread;
+ std::atomic<pid_t> _tid {0};
+ std::atomic<bool> _read_finished {false};
+ std::atomic<int> _read_errno {0};
+ std::mutex _mu;
+ std::condition_variable _ready_cv;
+};
+
+EvHttpServer* s_server = nullptr;
+BeThreadStackAction* s_action = nullptr;
+int s_real_port = 0;
+std::string s_hostname;
+
+Status do_get(const std::string& path, long* http_status, std::string* body) {
+ HttpClient client;
+ RETURN_IF_ERROR(client.init(s_hostname + path,
/*set_fail_on_error=*/false));
+ client.set_method(GET);
+ client.set_timeout_ms(5000);
+ RETURN_IF_ERROR(client.execute(body));
+ *http_status = client.get_http_status();
+ return Status::OK();
+}
+
+std::string thread_header(pid_t tid) {
+ return "----- thread " + std::to_string(tid) + " (";
+}
+
+int count_thread_headers(const std::string& body) {
+ int count = 0;
+ size_t pos = 0;
+ while ((pos = body.find("----- thread ", pos)) != std::string::npos) {
+ ++count;
+ pos += strlen("----- thread ");
+ }
+ return count;
+}
+
+std::string thread_result_line(const std::string& body, pid_t tid) {
+ const std::string header = thread_header(tid);
+ const size_t pos = body.find(header);
+ if (pos == std::string::npos) {
+ return "";
+ }
+ const size_t end = body.find('\n', pos);
+ return body.substr(pos, end == std::string::npos ? std::string::npos : end
- pos);
+}
+
+bool read_thread_syscall(pid_t tid, long* syscall_number) {
+ std::ifstream syscall_file("/proc/self/task/" + std::to_string(tid) +
"/syscall");
+ if (!syscall_file.is_open()) {
+ return false;
+ }
+ syscall_file >> *syscall_number;
+ return !syscall_file.fail();
+}
+
+bool wait_until_syscall(pid_t tid, long expected_syscall) {
+ for (int attempt = 0; attempt < 500; ++attempt) {
+ long syscall_number = -1;
+ if (read_thread_syscall(tid, &syscall_number) && syscall_number ==
expected_syscall) {
+ return true;
+ }
+ std::this_thread::sleep_for(std::chrono::milliseconds(10));
+ }
+ return false;
+}
+
+} // namespace
+
+class BeThreadStackActionTest : public testing::Test {
+protected:
+ static void SetUpTestSuite() {
+ config::enable_all_http_auth = false;
+ s_server = new EvHttpServer(0);
+ s_action = new BeThreadStackAction(nullptr);
+ s_server->register_handler(GET, "/api/stack_trace", s_action);
+ s_server->start();
+ s_real_port = s_server->get_real_port();
+ ASSERT_NE(0, s_real_port);
+ s_hostname = "http://127.0.0.1:" + std::to_string(s_real_port);
+ }
+
+ static void TearDownTestSuite() {
+ delete s_server;
+ s_server = nullptr;
+ delete s_action;
+ s_action = nullptr;
+ config::enable_all_http_auth = false;
+ }
+};
+
+// Covers explicit thread_id filtering and back-to-back remote signal
captures: multiple TIDs must
+// be sampled exactly once, without falling back to full-process enumeration
or timing out the next
+// capture because the previous handler has not fully released its latch yet.
+TEST_F(BeThreadStackActionTest, ThreadIdSelectorSupportsSingleAndMultipleIds) {
+ ParkedMarkerThread first;
+ ParkedMarkerThread second;
+ first.start();
+ second.start();
+
+ long http_status = 0;
+ std::string body;
+ ASSERT_TRUE(do_get("/api/stack_trace?thread_id=" +
std::to_string(first.tid()) + "," +
+ std::to_string(second.tid()) + "&mode=disabled",
+ &http_status, &body)
+ .ok());
+ ASSERT_EQ(200, http_status);
+ EXPECT_THAT(body, testing::HasSubstr("BE thread stack traces\n"));
+ EXPECT_THAT(body, testing::HasSubstr("service_signal: "));
+ EXPECT_THAT(body, testing::HasSubstr("thread_count: 2\n"));
+ EXPECT_THAT(body, testing::HasSubstr("dwarf_location_info_mode:
disabled\n"));
+ EXPECT_THAT(body,
+ testing::HasSubstr(
+ "signal_handler_unwinder: "
+
"frame_pointer_with_coordinator_signal_context_libunwind_fallback\n"));
+ EXPECT_THAT(body, testing::HasSubstr("capture_method="));
+ EXPECT_THAT(body, testing::HasSubstr(thread_header(first.tid())));
+ EXPECT_THAT(body, testing::HasSubstr(thread_header(second.tid())));
+ EXPECT_THAT(body, testing::HasSubstr("summary: captured=2 skipped=0
timed_out=0 "
+ "remote_signal_attempts=2\n"));
+ EXPECT_EQ(2, count_thread_headers(body));
+
+ first.stop();
+ second.stop();
+}
+
+// Covers the legacy tid alias so existing callers still get the same
single-thread capture path.
+TEST_F(BeThreadStackActionTest, TidAliasRemainsSupported) {
+ ParkedMarkerThread marker;
+ marker.start();
+
+ long http_status = 0;
+ std::string body;
+ ASSERT_TRUE(do_get("/api/stack_trace?tid=" + std::to_string(marker.tid())
+ "&mode=disabled",
+ &http_status, &body)
+ .ok());
+ ASSERT_EQ(200, http_status);
+ EXPECT_THAT(body, testing::HasSubstr("thread_count: 1\n"));
+ EXPECT_THAT(body, testing::HasSubstr(thread_header(marker.tid())));
+ EXPECT_THAT(body, testing::HasSubstr("capture_method="));
+ EXPECT_THAT(body, testing::HasSubstr("fp_status="));
+ EXPECT_EQ(1, count_thread_headers(body));
+
+ marker.stop();
+}
+
+// Covers explicit stale TID handling: a missing task should be reported as
thread_exited instead
+// of failing the whole request.
+TEST_F(BeThreadStackActionTest, ExplicitExitedTidIsReported) {
+ long http_status = 0;
+ std::string body;
+ ASSERT_TRUE(
+ do_get("/api/stack_trace?thread_id=999999&mode=disabled",
&http_status, &body).ok());
+ ASSERT_EQ(200, http_status);
+ EXPECT_THAT(body, testing::HasSubstr("thread_count: 1\n"));
+ EXPECT_THAT(body, testing::HasSubstr("----- thread 999999 (?"));
+ EXPECT_THAT(body, testing::HasSubstr("status=thread_exited"));
+}
+
+// Covers the default full-capture policy: a thread blocked in read() should
be signaled, captured,
+// and left blocked in read() after the stack request returns.
+TEST_F(BeThreadStackActionTest, BlockingReadSyscallIsCapturedByDefault) {
+ BlockingReadThread reader;
+ ASSERT_TRUE(reader.start());
+ ASSERT_TRUE(wait_until_syscall(reader.tid(), SYS_read));
+
+ long http_status = 0;
+ std::string body;
+ ASSERT_TRUE(
+ do_get("/api/stack_trace?thread_id=" +
std::to_string(reader.tid()) + "&mode=disabled",
+ &http_status, &body)
+ .ok());
+ ASSERT_EQ(200, http_status);
+ EXPECT_THAT(thread_result_line(body, reader.tid()),
testing::HasSubstr("status=ok"));
+ EXPECT_THAT(thread_result_line(body, reader.tid()),
testing::HasSubstr("capture_method="));
+ EXPECT_THAT(body, testing::HasSubstr("summary: captured=1 skipped=0
timed_out=0 "
+ "remote_signal_attempts=1\n"));
+ EXPECT_FALSE(reader.read_finished())
+ << "read was interrupted with errno=" << reader.read_errno();
+ EXPECT_TRUE(wait_until_syscall(reader.tid(), SYS_read));
+
+ reader.stop();
+}
+
+// Covers the opt-in conservative mode: skip_blocking_syscalls=true must avoid
signaling an
+// interrupt-sensitive read() thread and must report the skipped reason
explicitly.
+TEST_F(BeThreadStackActionTest, BlockingReadSyscallCanBeSkippedExplicitly) {
+ BlockingReadThread reader;
+ ASSERT_TRUE(reader.start());
+ ASSERT_TRUE(wait_until_syscall(reader.tid(), SYS_read));
+
+ long http_status = 0;
+ std::string body;
+ ASSERT_TRUE(do_get("/api/stack_trace?thread_id=" +
std::to_string(reader.tid()) +
+ "&mode=disabled&skip_blocking_syscalls=true",
+ &http_status, &body)
+ .ok());
+ ASSERT_EQ(200, http_status);
+ EXPECT_THAT(body, testing::HasSubstr("skip_blocking_syscalls: true\n"));
+ EXPECT_THAT(thread_result_line(body, reader.tid()),
+ testing::HasSubstr("status=skipped_blocking_syscall
syscall=read "
+ "syscall_number=0"));
+ EXPECT_THAT(body, testing::HasSubstr("<no stack captured>"));
+ EXPECT_THAT(body, testing::HasSubstr("summary: captured=0 skipped=1
timed_out=0 "
+ "remote_signal_attempts=0\n"));
+
+ reader.stop();
+}
+
+// Covers request validation for thread filters, timeout, symbolization mode,
and the syscall-skip
+// flag, so invalid knobs fail before any thread is signaled.
+TEST_F(BeThreadStackActionTest, InvalidParamsReturnBadRequest) {
+ struct InvalidCase {
+ std::string path;
+ std::string message;
+ };
+
+ const std::vector<InvalidCase> cases = {
+ {"/api/stack_trace?thread_id=abc", "invalid thread_id: abc"},
+ {"/api/stack_trace?thread_id=-1", "invalid thread_id: -1"},
+ {"/api/stack_trace?thread_id=1,,2", "invalid thread_id: empty
token"},
+ {"/api/stack_trace?thread_id=,1", "invalid thread_id: empty
token"},
+ {"/api/stack_trace?thread_id=1,", "invalid thread_id: empty
token"},
+ {"/api/stack_trace?thread_id=0", "invalid thread_id: 0"},
+ {"/api/stack_trace?thread_id=2147483648", "invalid thread_id:
2147483648"},
+ {"/api/stack_trace?tid=1&thread_id=2", "tid and thread_id are
mutually exclusive"},
+ {"/api/stack_trace?timeout_ms=0", "invalid timeout_ms: 0"},
+ {"/api/stack_trace?timeout_ms=10001", "invalid timeout_ms: 10001"},
+ {"/api/stack_trace?mode=unknown", "invalid
dwarf_location_info_mode: unknown"},
+ {"/api/stack_trace?skip_blocking_syscalls=maybe",
+ "invalid skip_blocking_syscalls: maybe"},
+ };
+
+ for (const auto& c : cases) {
+ SCOPED_TRACE(c.path);
+ long http_status = 0;
+ std::string body;
+ ASSERT_TRUE(do_get(c.path, &http_status, &body).ok());
+ EXPECT_EQ(400, http_status);
+ EXPECT_THAT(body, testing::HasSubstr(c.message));
+ }
+}
+
+// Covers best-effort symbolization in FAST mode by repeatedly sampling a
stable marker thread until
+// a test frame is visible in the rendered stack.
+TEST_F(BeThreadStackActionTest, BestEffortSymbolizedFrameObserved) {
+ ParkedMarkerThread marker;
+ marker.start();
+
+ bool found = false;
+ for (int attempt = 0; attempt < 100 && !found; ++attempt) {
+ long http_status = 0;
+ std::string body;
+ ASSERT_TRUE(do_get("/api/stack_trace?thread_id=" +
std::to_string(marker.tid()) +
+ "&mode=FAST&timeout_ms=1000",
+ &http_status, &body)
+ .ok());
+ ASSERT_EQ(200, http_status);
+ ASSERT_THAT(body, testing::HasSubstr(thread_header(marker.tid())));
+ if (body.find("ParkedMarkerThread") != std::string::npos ||
+ body.find("spin_until_stopped") != std::string::npos ||
+ body.find("be_thread_stack_action_test") != std::string::npos) {
+ found = true;
+ }
+ }
+ EXPECT_TRUE(found) << "no symbolized marker frame observed in 100
attempts";
+
+ marker.stop();
+}
+
+// Covers StackTrace cache isolation by DWARF mode. The same PCs must not
reuse a cached
+// DISABLED rendering for FAST, or leak FAST file/line output back into
DISABLED.
+TEST_F(BeThreadStackActionTest, StackTraceCacheSeparatesDwarfModes) {
+ StackTrace::dropCache();
+ StackTrace trace;
+ const std::string disabled_first = trace.toString(-3, "disabled");
+ const std::string fast_after_disabled = trace.toString(-3, "fast");
+ ASSERT_THAT(fast_after_disabled,
testing::HasSubstr("be_thread_stack_action_test"));
+ EXPECT_NE(disabled_first, fast_after_disabled);
+
+ StackTrace::dropCache();
+ const std::string fast_first = trace.toString(-3, "fast");
+ const std::string disabled_after_fast = trace.toString(-3, "disabled");
+ EXPECT_EQ(fast_after_disabled, fast_first);
+ EXPECT_EQ(disabled_first, disabled_after_fast);
+ EXPECT_NE(disabled_after_fast, fast_first);
+}
+
+#else
+
+TEST(BeThreadStackActionTest, LinuxOnlyPlaceholder) {
+ GTEST_SKIP() << "BE stack trace HTTP action is Linux-only";
+}
+
+#endif
+
+} // namespace doris
diff --git
a/regression-test/suites/http_rest_api/get/test_be_stack_trace_api.groovy
b/regression-test/suites/http_rest_api/get/test_be_stack_trace_api.groovy
new file mode 100644
index 00000000000..1d7f36d67eb
--- /dev/null
+++ b/regression-test/suites/http_rest_api/get/test_be_stack_trace_api.groovy
@@ -0,0 +1,169 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import org.apache.doris.regression.util.Http
+
+import java.util.concurrent.TimeUnit
+import java.util.concurrent.atomic.AtomicBoolean
+import java.util.concurrent.atomic.AtomicReference
+
+suite("test_be_stack_trace_api", "nonConcurrent") {
+ def marker = "test_be_stack_trace_api_load_marker"
+ def queryThreadNum = 3
+ def loadQuery = """
+ select /* ${marker} */
+ sum(length(md5(concat(cast(number as string), '${marker}'))))
+ from numbers("number" = "1000000000")
+ """
+
+ def aliveBackends = sql_return_maparray("show backends").findAll {
+ "${it.Alive}".equalsIgnoreCase("true") && "${it.HttpPort}" != "-1"
+ }
+ assertTrue(aliveBackends.size() > 0, "No alive backend found for BE stack
trace API test.")
+
+ def beEndpoints = aliveBackends.collect { "${it.Host}:${it.HttpPort}" }
+ logger.info("Alive BE HTTP endpoints for stack trace API test:
${beEndpoints}".toString())
+
+ def loadQueryIds = new LinkedHashSet<String>()
+ def queryFailure = new AtomicReference<Throwable>(null)
+ def stopLoad = new AtomicBoolean(false)
+ def futures = []
+
+ Boolean enableTLS =
(context.config.otherConfigs.get("enableTLS")?.toString()?.equalsIgnoreCase("true"))
?: false
+ if (enableTLS) {
+ // Keep direct BE REST calls consistent with the shared HTTP helper
setup used by FE REST tests.
+ Http.configure(enableTLS,
+ context.config.otherConfigs.get("tlsVerifyMode"),
+ context.config.otherConfigs.get("trustStorePath"),
+ context.config.otherConfigs.get("trustStorePassword"),
+ context.config.otherConfigs.get("trustStoreType"),
+ context.config.otherConfigs.get("keyStorePath"),
+ context.config.otherConfigs.get("keyStorePassword"),
+ context.config.otherConfigs.get("keyStoreType")
+ )
+ }
+
+ def findLoadQueryIds = {
+ def processList = sql_return_maparray("show processlist")
+ processList.each { row ->
+ if ("${row.Info}".contains(marker)) {
+ loadQueryIds.add("${row.QueryId}")
+ }
+ }
+ return loadQueryIds.size()
+ }
+
+ def killLoadQueries = {
+ findLoadQueryIds.call()
+ loadQueryIds.each { queryId ->
+ try {
+ logger.info("Kill BE stack trace regression load query
${queryId}".toString())
+ sql """kill query "${queryId}" """
+ } catch (Throwable t) {
+ logger.info("Ignore load query kill failure for ${queryId}:
${t.getMessage()}".toString())
+ }
+ }
+ }
+
+ // The API should expose both the response structure and a real running
pipeline/operator stack.
+ // Checking only scheduler frames would be weak because idle workers also
contain scheduler calls.
+ def hasExecutingOperatorStack = { String stackText ->
+ def operatorFrames = [
+ "VectorizedFnCall",
+ "AggSinkOperatorX",
+ "AggSourceOperatorX",
+ "FunctionStringDigest",
+ "CastToStringFunction"
+ ]
+ return stackText.split(/(?m)^----- thread /).any { section ->
+ section.contains("status=ok") &&
+ section.contains("(p_") &&
+ section.contains("PipelineTask::execute") &&
+ operatorFrames.any { frame -> section.contains(frame) }
+ }
+ }
+
+ def fetchStack = { String endpoint ->
+ def url =
"http://${endpoint}/api/stack_trace?mode=FAST&timeout_ms=3000&skip_blocking_syscalls=false"
+ return Http.GET(url, false, false) as String
+ }
+
+ try {
+ (1..queryThreadNum).each { index ->
+ futures.add(thread("be-stack-trace-load-${index}") {
+ try {
+ sql "set enable_sql_cache=false"
+ sql "set parallel_pipeline_task_num=4"
+ while (!stopLoad.get()) {
+ sql loadQuery
+ }
+ } catch (Throwable t) {
+ if (!stopLoad.get()) {
+ queryFailure.compareAndSet(null, t)
+ }
+ }
+ })
+ }
+
+ long queryDeadline = System.currentTimeMillis() +
TimeUnit.SECONDS.toMillis(30)
+ while (System.currentTimeMillis() < queryDeadline &&
findLoadQueryIds.call() == 0) {
+ if (queryFailure.get() != null) {
+ throw queryFailure.get()
+ }
+ sleep(200)
+ }
+ assertTrue(loadQueryIds.size() > 0, "Did not observe the load query in
processlist.")
+
+ String matchedStack = null
+ String lastStack = ""
+ long stackDeadline = System.currentTimeMillis() +
TimeUnit.SECONDS.toMillis(30)
+ while (System.currentTimeMillis() < stackDeadline && matchedStack ==
null) {
+ for (String endpoint : beEndpoints) {
+ def stack = fetchStack.call(endpoint)
+ lastStack = stack
+ assertTrue(stack.contains("BE thread stack traces"))
+ assertTrue(stack.contains("service_signal:"))
+ assertTrue(stack.contains("thread_count:"))
+ assertTrue(stack.contains("dwarf_location_info_mode: fast"))
+ assertTrue(stack.contains("skip_blocking_syscalls: false"))
+ assertTrue(stack.contains("summary:"))
+
+ if (hasExecutingOperatorStack.call(stack)) {
+ matchedStack = stack
+ logger.info("Matched BE stack trace operator evidence from
${endpoint}".toString())
+ break
+ }
+ }
+ if (queryFailure.get() != null) {
+ throw queryFailure.get()
+ }
+ if (matchedStack == null) {
+ sleep(500)
+ }
+ }
+
+ assertTrue(matchedStack != null,
+ "BE stack trace did not contain an executing pipeline/operator
stack. Last stack head:\n" +
+ lastStack.readLines().take(80).join("\n"))
+ } finally {
+ stopLoad.set(true)
+ killLoadQueries.call()
+ futures.each { future ->
+ future.get(60, TimeUnit.SECONDS)
+ }
+ }
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]