liaoxin01 commented on PR #64221:
URL: https://github.com/apache/doris/pull/64221#issuecomment-4770078158

   ### Concern: v2 uses a process-wide global static for the close-wait 
CV/version
   
   In `load_stream_stub.cpp`, the wakeup primitives are function-local statics:
   
   ```cpp
   bthread::Mutex& close_wait_mutex();            // one per process
   bthread::ConditionVariable& close_wait_cv();   // one per process
   std::atomic<int64_t>& close_wait_version_value(); // one per process
   ```
   
   So `notify_close_wait()` / `wait_for_close_event()` are shared by **every 
concurrent load on the BE**. This has two downsides:
   
   1. **Thundering herd across unrelated loads.** When any single stream 
closes, `notify_all()` wakes *all* writers currently in `_close_wait`, each of 
which re-scans its own streams and goes back to sleep.
   2. **Global version pollution.** The version counter is global, so a stream 
close from an unrelated load bumps it and makes `wait_for_close_event` of every 
other load return immediately and re-loop.
   
   Under high load concurrency the global `notify_all` can fire much more 
frequently than every 10ms, so the event-driven path may actually burn *more* 
CPU than the fixed 10ms poll it replaces — the opposite of this PR's goal.
   
   Note the inconsistency with v1: `IndexChannel` scopes the CV/version/mutex 
to the channel instance (`_close_wait_cv`, `_close_wait_version`), which is 
clean. v2 should do the analogous thing.
   
   **Suggested fix:** scope the signal to one load instead of the whole 
process. The natural owner is `LoadStreamMap` (obtained from the pool per 
`load_id`+`src_id`), which is exactly the shared owner of this batch of stubs. 
Concretely: put a small `CloseWaitNotifier { atomic<int64_t> version; 
bthread::Mutex; bthread::ConditionVariable; }` (same logic as today, just 
instance-scoped) on `LoadStreamMap`, inject a `shared_ptr` of it into each 
`LoadStreamStub` at creation, have `on_closed`/`cancel` call the stub's 
instance `notify_close_wait()`, and have `VTabletWriterV2::_close_wait` wait on 
`_load_stream_map`'s notifier.
   
   This keeps the monotonic read-only version (still required, since multiple 
writers can share one `LoadStreamMap` and must not consume each other's 
wakeups), preserves all the lost-wakeup/timeout behavior, and eliminates the 
cross-load thundering herd. It also mirrors v1's per-`IndexChannel` scoping.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to