ojalberts-itc commented on issue #64708:
URL: https://github.com/apache/doris/issues/64708#issuecomment-4775094553
## Correction: the wedge **does** reproduce — reliably — once enough
multi-replica write history accumulates, and here is a fresh in-process capture
My earlier update said we "could not reproduce the wedge on a fresh
cluster." That was premature, and I want to correct it with evidence.
On the **same fresh-from-Terraform 4.0.6 control cluster** (cause-#1 8040 SG
fix present from t=0), after a day of accumulated multi-replica write activity,
the wedge now reproduces **reliably on essentially every stock (multi-replica)
write**. We had been keeping the cluster usable with
`experimental_enable_single_replica_insert=true`; the moment we set it back to
`false` and issued a **single repl=3 `INSERT`**, the write path wedged
immediately. Reads and the 9050 heartbeat stayed up; `SHOW BACKENDS` = all
`Alive=true` throughout. Recovery is still a **full BE-fleet restart** (a
single-BE restart does not clear it).
We captured the in-process state on all 4 BEs at the live wedge
(2026-06-23), before any restart.
### 1. Parked write thread — the brpc load-stream OPEN never returns
`gstack` on a coordinator BE (full dump = 1746 threads) shows a write thread
parked exactly here:
```
#3 bthread_id_join ()
#4 brpc::Channel::CallMethod(...)
#5 doris::FailureDetectChannel::CallMethod (...)
be/src/util/brpc_client_cache.h:121
#6 doris::LoadStreamStub::open (..., txn_id=554, total_streams=2,
idle_timeout_ms=14400000)
be/src/vec/sink/load_stream_stub.cpp:195
#7 doris::LoadStreamStubs::open (...)
be/src/vec/sink/load_stream_stub.cpp:574
#8 doris::vectorized::VTabletWriterV2::_open_streams_to_backend
(dst_id=1782147896322, ...)
be/src/vec/sink/writer/vtablet_writer_v2.cpp:317
#9 doris::vectorized::VTabletWriterV2::_open_streams (...)
be/src/vec/sink/writer/vtablet_writer_v2.cpp:296
#10 doris::vectorized::VTabletWriterV2::open (...)
be/src/vec/sink/writer/vtablet_writer_v2.cpp:272
#11 doris::vectorized::AsyncResultWriter::process_block (...)
be/src/vec/sink/writer/async_result_writer.cpp:119
#12
doris::vectorized::AsyncResultWriter::start_writer(...)::$_0::operator()()
be/src/vec/sink/writer/async_result_writer.cpp:105
#16 doris::ThreadPool::dispatch_thread (...)
be/src/util/threadpool.cpp:616
```
(Build path `/home/zcp/repo_center/doris_release/doris/be/src/...` confirms
the official 4.0.6 GA build, `doris-4.0.6-rc02-1663f25c16f`.)
### 2. `[E1008]` on `:8060` including the in-process loopback
`be.WARNING` on the wedged target BE (`10.0.0.209`, backend id
`1782147896323`) shows it cancelling a load-stream whose **source and
destination are the same backend** — a BE that cannot open a load-stream to its
own `:8060`:
```
load_stream_stub.cpp:591 open stream failed: [INTERNAL_ERROR]Failed to
connect to backend
1782147896323: [E1008]Reached timeout=60000ms @10.0.0.209:8060
; stream: load_id=..., src_id=1782147896323, dst_id=1782147896323,
stream_id=...
brpc_closure.h:128 RPC meet failed: [E1008]Reached timeout=533998ms
@10.0.0.209:8060
```
Raw TCP to `:8060` (peer and loopback) is OPEN throughout — the stall is in
the brpc application layer, not the socket/TCP/SG. (`src_id == dst_id` loopback
proof; the ~534s park is the RPC timeout.)
### 3. Workers are parked, not saturated
brpc `/vars` (`:8060`) at the wedge:
```
bthread_worker_usage : 1.14142
bthread_count : 5
load_stream_count : 1–3
rpc_server_8060_connection_count : 13–14
```
The write threads are blocked on the RPC, not starved — far below any worker
ceiling.
### 4. Zero-clone wedge (isolated from cause #1)
FE-side, all BEs `Alive=true` and `SHOW PROC
'/cluster_balance/{running,pending}_tablets'` are **empty** — no clone
activity. This is purely the brpc load-stream socket going Broken and never
reviving (the brpc #1168 class), not the cause-#1 clone storm.
### Full dumps available
Per-BE `tar.gz` (full `pstack` × 1746 threads, complete `/vars`, `/rpcz`,
`/metrics`, `be.WARNING` tail) and FE `cluster_balance`/`statistic` are ready —
I can drag-drop them onto this issue if that's the most useful form (the inline
excerpts above are the load-bearing parts).
### Next: testing 4.1.2
We are about to switch this cluster to **4.1.2** and run the same test
against it. We'll post an update with the 4.1.2 result as soon as that is done.
The workaround that unblocks heavy loads in the meantime is
`experimental_enable_single_replica_insert=true` (writes one replica then
clones the rest over the 8040 path, sidestepping the BE-to-BE 8060 load-stream
opens — not a fix). Original Question 1 stands: is this a known brpc 1.4.0
load-stream socket defect, and is there a fixing PR or a version that resolves
it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]