ojalberts-itc commented on issue #64708:
URL: https://github.com/apache/doris/issues/64708#issuecomment-4775094553

   ## Correction: the wedge **does** reproduce — reliably — once enough 
multi-replica write history accumulates, and here is a fresh in-process capture
   
   My earlier update said we "could not reproduce the wedge on a fresh 
cluster." That was premature, and I want to correct it with evidence.
   
   On the **same fresh-from-Terraform 4.0.6 control cluster** (cause-#1 8040 SG 
fix present from t=0), after a day of accumulated multi-replica write activity, 
the wedge now reproduces **reliably on essentially every stock (multi-replica) 
write**. We had been keeping the cluster usable with 
`experimental_enable_single_replica_insert=true`; the moment we set it back to 
`false` and issued a **single repl=3 `INSERT`**, the write path wedged 
immediately. Reads and the 9050 heartbeat stayed up; `SHOW BACKENDS` = all 
`Alive=true` throughout. Recovery is still a **full BE-fleet restart** (a 
single-BE restart does not clear it).
   
   We captured the in-process state on all 4 BEs at the live wedge 
(2026-06-23), before any restart.
   
   ### 1. Parked write thread — the brpc load-stream OPEN never returns
   
   `gstack` on a coordinator BE (full dump = 1746 threads) shows a write thread 
parked exactly here:
   
   ```
   #3  bthread_id_join ()
   #4  brpc::Channel::CallMethod(...)
   #5  doris::FailureDetectChannel::CallMethod (...)            
be/src/util/brpc_client_cache.h:121
   #6  doris::LoadStreamStub::open (..., txn_id=554, total_streams=2, 
idle_timeout_ms=14400000)
                                                                
be/src/vec/sink/load_stream_stub.cpp:195
   #7  doris::LoadStreamStubs::open (...)                       
be/src/vec/sink/load_stream_stub.cpp:574
   #8  doris::vectorized::VTabletWriterV2::_open_streams_to_backend 
(dst_id=1782147896322, ...)
                                                                
be/src/vec/sink/writer/vtablet_writer_v2.cpp:317
   #9  doris::vectorized::VTabletWriterV2::_open_streams (...)  
be/src/vec/sink/writer/vtablet_writer_v2.cpp:296
   #10 doris::vectorized::VTabletWriterV2::open (...)           
be/src/vec/sink/writer/vtablet_writer_v2.cpp:272
   #11 doris::vectorized::AsyncResultWriter::process_block (...) 
be/src/vec/sink/writer/async_result_writer.cpp:119
   #12 
doris::vectorized::AsyncResultWriter::start_writer(...)::$_0::operator()()
                                                                
be/src/vec/sink/writer/async_result_writer.cpp:105
   #16 doris::ThreadPool::dispatch_thread (...)                 
be/src/util/threadpool.cpp:616
   ```
   
   (Build path `/home/zcp/repo_center/doris_release/doris/be/src/...` confirms 
the official 4.0.6 GA build, `doris-4.0.6-rc02-1663f25c16f`.)
   
   ### 2. `[E1008]` on `:8060` including the in-process loopback
   
   `be.WARNING` on the wedged target BE (`10.0.0.209`, backend id 
`1782147896323`) shows it cancelling a load-stream whose **source and 
destination are the same backend** — a BE that cannot open a load-stream to its 
own `:8060`:
   
   ```
   load_stream_stub.cpp:591  open stream failed: [INTERNAL_ERROR]Failed to 
connect to backend
     1782147896323: [E1008]Reached timeout=60000ms @10.0.0.209:8060
     ; stream: load_id=..., src_id=1782147896323, dst_id=1782147896323, 
stream_id=...
   brpc_closure.h:128  RPC meet failed: [E1008]Reached timeout=533998ms 
@10.0.0.209:8060
   ```
   
   Raw TCP to `:8060` (peer and loopback) is OPEN throughout — the stall is in 
the brpc application layer, not the socket/TCP/SG. (`src_id == dst_id` loopback 
proof; the ~534s park is the RPC timeout.)
   
   ### 3. Workers are parked, not saturated
   
   brpc `/vars` (`:8060`) at the wedge:
   
   ```
   bthread_worker_usage           : 1.14142
   bthread_count                  : 5
   load_stream_count              : 1–3
   rpc_server_8060_connection_count : 13–14
   ```
   
   The write threads are blocked on the RPC, not starved — far below any worker 
ceiling.
   
   ### 4. Zero-clone wedge (isolated from cause #1)
   
   FE-side, all BEs `Alive=true` and `SHOW PROC 
'/cluster_balance/{running,pending}_tablets'` are **empty** — no clone 
activity. This is purely the brpc load-stream socket going Broken and never 
reviving (the brpc #1168 class), not the cause-#1 clone storm.
   
   ### Full dumps available
   
   Per-BE `tar.gz` (full `pstack` × 1746 threads, complete `/vars`, `/rpcz`, 
`/metrics`, `be.WARNING` tail) and FE `cluster_balance`/`statistic` are ready — 
I can drag-drop them onto this issue if that's the most useful form (the inline 
excerpts above are the load-bearing parts).
   
   ### Next: testing 4.1.2
   
   We are about to switch this cluster to **4.1.2** and run the same test 
against it. We'll post an update with the 4.1.2 result as soon as that is done.
   
   The workaround that unblocks heavy loads in the meantime is 
`experimental_enable_single_replica_insert=true` (writes one replica then 
clones the rest over the 8040 path, sidestepping the BE-to-BE 8060 load-stream 
opens — not a fix). Original Question 1 stands: is this a known brpc 1.4.0 
load-stream socket defect, and is there a fixing PR or a version that resolves 
it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to