On Fri May 29, 2026 at 1:29 PM BST, Onur Özkan wrote: > On Fri, 29 May 2026 13:07:18 +0100 > Gary Guo <[email protected]> wrote: > >> On Fri May 29, 2026 at 7:57 AM BST, Onur Özkan wrote: >> >> >> > +#[pinned_drop] >> >> >> > +impl PinnedDrop for Srcu { >> >> >> > + fn drop(self: Pin<&mut Self>) { >> >> >> > + let ptr = self.inner.get(); >> >> >> > + >> >> >> > + // SAFETY: By the type invariants, `self` contains a valid >> >> >> > and pinned `struct srcu_struct` >> >> >> > + // and `srcu_readers_active()` only checks the active >> >> >> > reader count. >> >> >> > + if unsafe { bindings::srcu_readers_active(ptr) } { >> >> >> > + crate::pr_warn!( >> >> >> > + "Leaked `Guard` detected while dropping SRCU; drop >> >> >> > will block forever.\n" >> >> >> > + ); >> >> >> >> I think this could be a `warn_on` similar to how cleanup_srcu_struct >> >> handle the >> >> condition. >> > >> > We also call cleanup_srcu_struct below. The idea was to provide additional >> > information, we don't need to call warn_on twice. >> >> If the code blocks on `synchronize_srcu` then there's no call to >> `cleanup_srcu_struct`. > > > Ah right. I can do that in this case but honestly it's still more informative > with the current way. It explicitly tells you what the problem is.
While the error message itself is not that informative (there's some potential to improve this, see https://lore.kernel.org/all/[email protected]/), the stack trace produced by a `warn_on` would have all the information and is more useful to troubleshoot than a `pr_warn` which doesn't tell you which `Srcu` being dropped is causing the issue. >> >> > >> >> > Actually, now I am now thinking about whether we can come up with a >> >> > better >> >> > approach when we detect leaked guards. Initially I came up with the >> >> > synchronize_srcu() solution because it would handle leaked guards >> >> > automatically >> >> > without requiring any additional checks. But now that we can actually >> >> > detect >> >> > whether guards are leaked the question becomes: >> >> > >> >> > "Is there a better option than effectively sleeping forever >> >> > when leaked >> >> > guards are detected?" >> >> > >> >> > I have no plans for tomorrow other than finalizing this series >> >> > including the >> >> > question above. >> >> >> >> The best solution is to proceed cleanups anyway, given Rust rules ensure >> >> that >> >> these are actual leaks and not just srcu read-side critical section that >> >> failed >> >> to synchronize with the destruction of SRCU. >> >> >> >> This obviously require changes to the SRCU code though. >> > >> > >> > The issue is difficult to fix purely from the C side. Once drop returns >> > Rust >> > is free to destroy srcu_struct. If srcu still has pending callback >> > associated >> > with that srcu_struct, for example from a future call_srcu() wrapper then >> > returning from drop while readers are active can turn into a UAF. There is >> > also >> > no way to handle callbacks in a reasonable way in cleanup logic while >> > there are >> > active readers. >> >> Callbacks should be flushed during the drop due to srcu_barrier. Am I missing >> something? > > No. Callbacks can only be invoked once the grace period has completed [1], > which > can never happen while there is an active reader. > > [1]: > https://elixir.bootlin.com/linux/v7.1-rc5/source/kernel/rcu/srcutree.c#L1452-L1454 Well, then srcu_barrier will not return. When `srcu_barrier` returns all in-flight SRCU callbacks must have been executed. Best, Gary > >> >> I'm pretty sure that, if we disregard potential misuses from C side, removing >> all "leak it" paths would be fine and won't leak to UAF if all users are from >> Rust side. >> >> To be very clear, I am not advocating to actually implement this way. I agree >> with your conclusion below that this is broken code and a warning + blocking >> is >> good enough. This is really just my thoughts on your "is there a better >> option" >> question, and I think it's better in ideal world, but I think blocking is a >> good pragmatic choice. > > I see. Maybe I should have phrased the question like "Is there a better option > with similar complexity" to be more clear.

