liaoxin01 opened a new pull request, #64574: URL: https://github.com/apache/doris/pull/64574
## Problem The tablet header lock (`_meta_lock`) is a `std::shared_mutex`, which under libstdc++ wraps `pthread_rwlock_t` and is **thread-affine**: unlocking it from an OS thread other than the one that acquired it is undefined behavior. In cloud mode the header **write** lock can be held across a **suspending** call. Concretely, when a query-driven rowset sync pulls an overlapping (compacted) rowset, `CloudTablet::add_rowsets` warms up the new remote file **while holding the write lock**; on a cold-restarted BE, resolving the storage vault / building the S3 client issues a meta-service RPC. The holding bthread suspends and may **migrate to another worker pthread**, so the matching unlock runs on a **different OS thread**. This corrupts the glibc rwlock (the write-locked bit is left set with no owner), **permanently wedging** the lock — all readers/writers on that tablet pile up and queries time out (~90s). Observed in graceful-restart tests: a tablet's header lock stuck with `active_writer=[none]` yet every `try_lock`/`try_lock_shared` failing for >70 min, and the last writer's acquire OS-tid differing from its release OS-tid (proof of the cross-thread unlock). ## Fix Replace `_meta_lock` with `BthreadSharedMutex`, a port of libc++'s `std::shared_mutex` (the two-gate condition-variable algorithm) onto `bthread::Mutex` / `bthread::ConditionVariable`: - Ownership is an integer state guarded by a briefly held internal mutex and carries **no OS-thread identity**, so locking on one worker and unlocking on another after a bthread migration is well defined — the permanent wedge can no longer happen. - Waiting blocks on a bthread condition variable, suspending the bthread instead of blocking the worker. - Writer-preferring; satisfies the C++ SharedMutex requirements, so it is a drop-in with `std::unique_lock` / `std::shared_lock`. Only the tablet header lock is switched; unrelated `std::shared_mutex` members (e.g. `TabletMeta::_meta_lock`) are left untouched. Call sites that named the type explicitly are converted to class template argument deduction or to `BthreadSharedMutex`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
