On Mon, 11 May 2026 20:11:26 +0300 Onur Özkan <[email protected]> wrote:
> On Sun, 03 May 2026 20:25:03 +0100 > Gary Guo <[email protected]> wrote: > > > On Sun May 3, 2026 at 4:39 AM BST, Onur Özkan wrote: > > >> > +/// Sleepable read-copy update primitive. > > >> > +/// > > >> > +/// SRCU readers may sleep while holding the read-side guard. > > >> > +/// > > >> > +/// The destructor may sleep. > > >> > +/// > > >> > +/// # Invariants > > >> > +/// > > >> > +/// This represents a valid `struct srcu_struct` initialized by the C > > >> > SRCU API > > >> > +/// and it remains pinned and valid until the pinned destructor runs. > > >> > +#[repr(transparent)] > > >> > +#[pin_data(PinnedDrop)] > > >> > +pub struct Srcu { > > >> > + #[pin] > > >> > + inner: Opaque<bindings::srcu_struct>, > > >> > +} > > >> > + > > >> > +impl Srcu { > > >> > + /// Creates a new SRCU instance. > > >> > + #[inline] > > >> > + pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) > > >> > -> impl PinInit<Self, Error> { > > >> > + try_pin_init!(Self { > > >> > + inner <- Opaque::try_ffi_init(|ptr: *mut > > >> > bindings::srcu_struct| { > > >> > + // SAFETY: `ptr` points to valid uninitialised memory > > >> > for a `srcu_struct`. > > >> > + to_result(unsafe { > > >> > + bindings::init_srcu_struct_with_key(ptr, > > >> > name.as_char_ptr(), key.as_ptr()) > > >> > + }) > > >> > + }), > > >> > + }) > > >> > + } > > >> > + > > >> > + /// Enters an SRCU read-side critical section. > > >> > + /// > > >> > + /// # Safety > > >> > + /// > > >> > + /// The returned [`Guard`] must not be leaked. Leaking it with > > >> > [`core::mem::forget`] > > >> > + /// leaves the SRCU read-side critical section active. > > >> > > >> I generally would prefer if we could use guard-like API instead of > > >> forcing a > > >> callback. > > > > > > Me too and developers can still do that. I think the safety contract here > > > is > > > very simple to handle. It's essentially this: > > > > > > // SAFETY: Guard is not leaked. > > > let _guard = unsafe { x.read_lock() }; > > > > > > To me it's very simple and straightforward for both the developer and the > > > reviewer. It doesn't add any overhead to the implementation and it ensures > > > that the developer (and later the reviewer) is aware of the potential > > > issue. > > > > > > Of course, there's also the safe option if the developer is happy with > > > closure-based API: > > > > > > x.with_read_lock(|_guard| { > > > ... > > > }); > > > > > > So it allows you to use the guard-based approach directly with the > > > requirement > > > of a safety comment and also provides a safe API for developers who don't > > > want > > > to deal with that. I am not sure if you fall into the third category, > > > which is > > > "I don't like writing safety comments and I don't like the closure-based > > > approach" :) > > > > We have been avoiding using callback-based API if there's an alternatively > > way > > to achieve this. There has been quite a very precedents with this: > > > > - spin_lock_irqsave requires taking and releasing in correct order, which is > > easy to solve with a callback approach. The same logic reasoning can be > > used > > to provide an unsafe API + safe callback API, but Boqun & Lyude reworked > > the > > spinlock IRQ design so we don't need that anymore. > > > > - `Task::current` API could easily be replaced callback-based approach, but > > we > > used a macro to achieve without unsafe. > > > > The API here is not inherently impossible to use guard API. It's not safe > > today > > because a very specific detail. The callback-API is the "path of least > > resistence" approach, but it's not the optimal one. > > > > > > > >> > > >> I suppose the only reason that this is unsafe is the "just leak it" > > >> condition > > >> when cleaning up SRCU struct, which skips cleaning up delayed work, > > >> which could > > >> call into `process_srcu`, which accesses `srcu_struct`. This however is > > >> *not* > > >> leaked, because it's controlled by the user. Only the auxillary data > > >> allocated > > >> by SRCU is leaked. So UAF is going to happen. > > >> > > >> So in some aspect, the leak caused by `srcu_readers_active(ssp)` can > > >> cause more > > >> damage compared to just continuing cleanup despite active users? I think > > >> this > > >> could be changed in one of these ways: > > >> * Have SRCU allocate all memory instead, and user-side would just have a > > >> `struct srcu_struct*`; then leaking would be safe. This is probably a > > >> bit > > >> difficult to change because it affects many users. > > > > > > We could do the same on the Rust side only. Basically instead of embedding > > > srcu_struct in Rust srcu, allocate it separately and store its pointer. > > > Then, > > > if cleanup hits the active reader case, we could leak that allocation so > > > any > > > remaining srcu work does not hit UAF. I was aware of this option but I > > > would > > > prefer to avoid it because it adds an extra allocation for every Rust > > > srcu. > > > > > >> * Continue to flush delayed work and stop the timer, and then leak > > >> before the > > >> actual kfree happens. > > > > > > Can you say more? I didn't understand this particular solution. > > > > I was thinking that doing this _might_ be sufficient. I have to admit that > > I've > > not very familar with the internal implementation of SRCU to make it an > > assertion though. > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index 0d01cd8c4b4a..5d75a4dbb6e5 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -717,8 +717,6 @@ void cleanup_srcu_struct(struct srcu_struct *ssp) > > raw_spin_unlock_irq_rcu_node(ssp->srcu_sup); > > if (WARN_ON(!delay)) > > return; /* Just leak it! */ > > - if (WARN_ON(srcu_readers_active(ssp))) > > - return; /* Just leak it! */ > > /* Wait for irq_work to finish first as it may queue a new work. */ > > irq_work_sync(&sup->irq_work); > > flush_delayed_work(&sup->work); > > > > But after taking another look, I am not even sure if this is needed. A quick > > glance of the code it appears that __srcu_read_unlock doesn't do anything > > apart > > from adjusting the counter, and the SRCU grace period and thus the timers > > won't > > actually start unless there's a pending grace period, which won't start > > unless > > there's a call_srcu or sychronize_srcu. And we *know* that none of them > > would > > happen, as the lifetime guarantees that nothing accesses the `Srcu` struct > > when > > `drop` starts, and inside drop we have already invoked `srcu_barrier()`. > > > > So I think, even if we hit the "Just leak it" scenario, we can still safely > > deallocate the backing storage of `srcu_struct` and nothing should break? > > > > > > > >> * Trigger a `BUG` when the leak condition is hit for Rust users. > > > > > > We need an atomic counter to detect the leak and I thought that would be > > > too > > > much overhead for this abstraction. Basically each lock and drop will > > > call an > > > atomic operation so. > > > > You could just check if srcu_sup is NULL after calling > > `cleanup_srcu_struct`. > > > > Best, > > Gary > > > > > > > >> * Declare the `WARN_ON` to be a sufficient protection and say this can be > > >> considered safe. Kinda similar to the strategy we take to the > > >> sleep-inside-atomic context issue. > > > > > > I think this is a rather weak precaution. > > > > > > > Alright, here is the alternative approach I just figured and I think this > makes > the most sense compared to the ones we discussed in this series: > > #[pinned_drop] > impl PinnedDrop for Srcu { > fn drop(self: Pin<&mut Self>) { > let ptr = self.inner.get(); > > // `cleanup_srcu_struct()` may return early if readers > are still active. Because `Srcu` > // owns the embedded `srcu_struct`, returning from > `drop` in that state could free memory > // that is still referenced by the C side. > // > // Wait for all readers to complete first. If any > `Guard` was leaked, `synchronize_srcu()` > // will sleep forever. > // > // SAFETY: By the type invariants, `self` contains a > valid and pinned `struct srcu_struct`. > unsafe { bindings::synchronize_srcu(ptr) }; > > // ... > > (the code snippet above is copied from [1]) > > As explained in the doc comments, we avoid the potential UAFs by sleeping > forever which is considered non-unsafe, right? Alice said during today's > call that "sleeping is not good, but it is not unsafe". If that's the case, > I think this fixes the overall problem and we can make read_unlock safe again. > > What do you think? I will hold the next version until we reach a common point. I will be less available in the coming week and since there has been no response to my proposal here, I included it in v3 and sent it [1]. [1]: https://lore.kernel.org/all/[email protected] Regards, Onur > > [1]: https://github.com/onur-ozkan/linux/commit/28d9739f03 > > - Onur

