On Fri, 29 May 2026 13:07:18 +0100
Gary Guo <[email protected]> wrote:

> On Fri May 29, 2026 at 7:57 AM BST, Onur Özkan wrote:
> >> >> > +#[pinned_drop]
> >> >> > +impl PinnedDrop for Srcu {
> >> >> > +    fn drop(self: Pin<&mut Self>) {
> >> >> > +        let ptr = self.inner.get();
> >> >> > +
> >> >> > +        // SAFETY: By the type invariants, `self` contains a valid 
> >> >> > and pinned `struct srcu_struct`
> >> >> > +        // and `srcu_readers_active()` only checks the active reader 
> >> >> > count.
> >> >> > +        if unsafe { bindings::srcu_readers_active(ptr) } {
> >> >> > +            crate::pr_warn!(
> >> >> > +                "Leaked `Guard` detected while dropping SRCU; drop 
> >> >> > will block forever.\n"
> >> >> > +            );
> >> 
> >> I think this could be a `warn_on` similar to how cleanup_srcu_struct 
> >> handle the
> >> condition.
> >
> > We also call cleanup_srcu_struct below. The idea was to provide additional
> > information, we don't need to call warn_on twice.
> 
> If the code blocks on `synchronize_srcu` then there's no call to
> `cleanup_srcu_struct`.


Ah right. I can do that in this case but honestly it's still more informative
with the current way. It explicitly tells you what the problem is.

> 
> >
> >> 
> >> >> > +        }
> >> >> > +
> >> >> > +        // `cleanup_srcu_struct()` may return early if readers are 
> >> >> > still active. Because `Srcu`
> >> >> > +        // owns the embedded `srcu_struct`, returning from `drop` in 
> >> >> > that state could free memory
> >> >> > +        // that is still referenced by the C side.
> >> >> > +        //
> >> >> > +        // Wait for all readers to complete first. If any `Guard` 
> >> >> > was leaked, `synchronize_srcu()`
> >> >> > +        // will sleep forever.
> >> >> > +        //
> >> >> > +        // SAFETY: By the type invariants, `self` contains a valid 
> >> >> > and pinned `struct srcu_struct`.
> >> >> > +        unsafe { bindings::synchronize_srcu(ptr) };
> >> >> 
> >> >> Sashiko got a good point here which is calling synchronize_srcu() only 
> >> >> if there
> >> >> are active readers. That's a nice low-effort improvement we can have in 
> >> >> the next
> >> >> version.
> >> >> 
> >> >> Onur
> >> >
> >> > Actually, now I am now thinking about whether we can come up with a 
> >> > better
> >> > approach when we detect leaked guards. Initially I came up with the
> >> > synchronize_srcu() solution because it would handle leaked guards 
> >> > automatically
> >> > without requiring any additional checks. But now that we can actually 
> >> > detect
> >> > whether guards are leaked the question becomes:
> >> >
> >> >  "Is there a better option than effectively sleeping forever when leaked
> >> >   guards are detected?"
> >> >
> >> > I have no plans for tomorrow other than finalizing this series including 
> >> > the
> >> > question above.
> >> 
> >> The best solution is to proceed cleanups anyway, given Rust rules ensure 
> >> that
> >> these are actual leaks and not just srcu read-side critical section that 
> >> failed
> >> to synchronize with the destruction of SRCU.
> >> 
> >> This obviously require changes to the SRCU code though.
> >
> >
> > The issue is difficult to fix purely from the C side. Once drop returns Rust
> > is free to destroy srcu_struct. If srcu still has pending callback 
> > associated
> > with that srcu_struct, for example from a future call_srcu() wrapper then
> > returning from drop while readers are active can turn into a UAF. There is 
> > also
> > no way to handle callbacks in a reasonable way in cleanup logic while there 
> > are
> > active readers.
> 
> Callbacks should be flushed during the drop due to srcu_barrier. Am I missing
> something?

No. Callbacks can only be invoked once the grace period has completed [1], which
can never happen while there is an active reader.

[1]: 
https://elixir.bootlin.com/linux/v7.1-rc5/source/kernel/rcu/srcutree.c#L1452-L1454

> 
> I'm pretty sure that, if we disregard potential misuses from C side, removing
> all "leak it" paths would be fine and won't leak to UAF if all users are from
> Rust side.
> 
> To be very clear, I am not advocating to actually implement this way. I agree
> with your conclusion below that this is broken code and a warning + blocking 
> is
> good enough. This is really just my thoughts on your "is there a better 
> option"
> question, and I think it's better in ideal world, but I think blocking is a
> good pragmatic choice.

I see. Maybe I should have phrased the question like "Is there a better option
with similar complexity" to be more clear.

Onur

> 
> Best,
> Gary
> 
> >
> > I mean in theory this could be fixed in the C code, but that would require 
> > to
> > re-write srcu cores/semantics for this special case. The $clean_something 
> > helper
> > would need know that the active readers are abandoned and will never unlock 
> > and
> > it would also need to decide what to do with the pending callbacks, which is
> > also a big problem (as gp will never complete, callbacks will never run).
> >
> > It's also worth to note that calling mem::forget on the srcu guard is WRONG
> > CODE and very easy to catch on review (by us and also Sashiko/any LLM). So
> > finding a solution that doesn't add too much complexity should be a key
> > consideration here. With that in mind, keeping the synchronize_srcu() not 
> > really
> > a bad solution. Sleeping forever is a bad failure mode, but it is better 
> > than a
> > potential UAF and either case requires sending a fix patch for the leaked 
> > guard
> > anyway.
> 

Reply via email to