> On 5 Jun 2026, at 04:56, Philipp Stanner <[email protected]> wrote: > > On Thu, 2026-06-04 at 10:15 +0200, Boris Brezillon wrote: >> On Wed, 3 Jun 2026 21:43:05 -0300 >> Daniel Almeida <[email protected]> wrote: >> >>>> On 3 Jun 2026, at 14:14, Boris Brezillon <[email protected]> >>>> wrote: >>>> >>>> On Wed, 3 Jun 2026 13:41:02 -0300 >>>> Daniel Almeida <[email protected]> wrote: >>>> >>>>>> + /// Called when the fence is signaled. >>>>>> + /// >>>>>> + /// This is called from the fence signaling path, which may be in >>>>>> interrupt >>>>>> + /// context or with locks held, which is why `self` is only >>>>>> borrowed, so that >>>>>> + /// it cannot drop. Implementations must not sleep or perform >>>>>> + /// long-running operations. >>>>>> + /// >>>>>> + /// An implementation likely wants to inform itself (e.g., through >>>>>> a work item) >>>>>> + /// within this callback that the associated >>>>>> [`FenceCbRegistration`] can now be >>>>>> + /// dropped. >>>>>> + fn called(&mut self); >>>>> >>>>> This is a central point. We ideally would want this to consume self, >>>>> because we >>>>> may want to move things out of the callback. >>>> >>>> This one comes from me. The rationale being that ::called() is called >>>> from an atomic context, and the resources attached to the callback data >>>> might require acquiring other sleeping locks to be released, and >>>> sometimes you don't even notice immediately because said resources are >>>> refcounted, and the lock is only acquired when you happen to be the >>>> last owner. Yes, those can be caught at runtime if the C side is >>>> properly annotated with might_sleep(), but that's not always the case. >>>> >>>> If we defer the drop of the data only when the FenceCb is >>>> dropped/recycled, we're at least not constrained by this "runs in >>>> atomic context" thing. >>>> >>> >>> This design does not solve it, because one can quite trivially get around >>> this >>> restriction using Option<T> as I said. If your point is “don’t run any >>> drop() here”, >>> then &mut self doesn’t do it. >> >> My bad, I thought you were talking about some Option<T> in >> FenceCbRegistration<T> (there was one at some point, but it's gone now), >> but you're talking about having an Option<X> inside the T. Yes, there's >> indeed nothing preventing a drop on X in that path, and it's just as >> bad as passing the fence back as value to the callback in that case. > > Then maybe we should just pass it by value and require implementation > of an unsafe trait on `T`, whose safety-requirements demand that this > must be save to drop from atomic context. > >> >>> >>>>> >>>>> Consider a fence design where signal() consumes self. Now consider this: >>>>> >>>>> ``` >>>>> impl FenceCb for MyCallback { >>>>> fn called(&mut self) { >>>>> // Can't move the fence out, so we have to put an Option<T> just to be >>>>> able >>>>> // to move. >>>>> if let Some(f) = self.some_fence.take() { >>>>> f.signal(); >>>>> } >>>>> } >>>>> ``` >>>>> >>>>> This used to be the case when our version of the job queue used the "proxy >>>>> fence" design: >>>>> >>>>> >>>>> ``` >>>>> // Callback on the hw fence >>>>> impl FenceCb for MyCallback { >>>>> fn called(&mut self) { >>>>> if let Some(f) = self.submit_fence.take() { >>>>> f.signal(); >>>>> } >>>> >>>> I'm pretty sure lockdep won't like it anyway, because this is nested >>>> locking of the same lock class. For such proxies, we'll need to teach >>>> lockdep about the nesting like has been recently done on >>>> dma_fence_array & co. But I'm digressing. >>> >>> Yeah, but this is more about resource transfer in general, not >>> this pattern specifically. >>> >>> I agree that this has issues, and yes, lockdep complained back >>> then :) >> >> The thing is, there's so many aspects that could go wrong because of the >> context this callback is called in. Nested locking is one of them, >> the fact we can't sleep is another. And with rust it's even worse, >> because of the implicit drops that will happen when you take ownership >> of resources (taking sleeping locks to remove resources from a dataset >> for instance). > > Doesn't that have to be a problem for much of Rust infrastructure? How > do other parties solve that? > >> >> So, by passing self by value to the ::callback(), you're basically >> telling users "hey, BTW, don't forget to defer the drop to some >> workqueue if you think it's not atomic-safe". And how can users know >> that the thing they're about to drop can be dropped in atomic context? >> They basically have to audit the ::drop() of all the resources they >> embed in their type implementing FenceCb. Not only that, but they also >> have to design the thing so the deferral of this ::drop() doesn't >> allocate, because, obviously, allocating in atomic context is >> tricky/fallible. AFAIK, none of this can be spot at compile-time (I >> remember Gary/Danilo mentioning that we could teach the klint about >> some of these rules). This would leave us with runtime checks like >> might_sleep(), but most of the C putters (xxx_put(object)) don't have >> might_sleep() in the path where the decref doesn't lead to a refcnt=0 >> situation. >> >> TLDR; Call this PTSD if you want, but this is the sort of bugs I >> struggled with on the C side, and I can predict that the exact same >> will happen in rust drivers if we expose the FenceCb as it is designed >> here and we don't have a way to check the soundness of the FenceCb >> implementations at compile time. > > My guess would be that the existence of unsafe-traits is the admission > of Rust that this just cannot be guaranteed by design. > > If a driver cannot know whether this or that is safe to drop, then it > would have to defer it's dropping. Or would there be cases where this > also doesn't work? Although I totally understand where Boris is coming from here, and I agree with him, the reality is that the current &mut self design doesn’t solve this. An unsafe trait could work as a pinky-promise by drivers, which is half-way there. What we ideally would like to have is a bound though, something like: T: !Drop If I recall correctly there were people working to get support for that on Rust? I think there are two things here: !Trait, which is not supported except for !Sized IIRC, and having an auto trait that represents types that implement Drop, similar to Send and Sync. > >> >> The other option (the one I've been advocating for from the start), is >> to not let drivers implement FenceCb (make it private), but instead >> have a bunch of implementations that we know are safe. Here's a list of >> implementations that I think would unblock most of the drivers use >> cases: >> >> - wakeup a thread >> - complete a completion object >> - schedule a WorkItem >> - schedule a kthread_worker (once we get a proper rust abstraction for >> that) >> >> It doesn't mean we can't have optimized FenceCb implementations that do >> a lot more in the callback() path instead of deferring to a >> workqueue/thread, but at least those would have to be implemented in >> dma_fence.rs, and the dma_fence.rs maintainers can then carefully audit >> the code as part of the review process, which we know is not really the >> case when changes touch drivers code only. > > Pragmatically speaking, if the common cases are trivial, then the > drivers will get them right, because those critical primitives are > already atomic-safe. No, you can still fumble trivial code, specially when it has to be cargo-culted from somewhere else. I am not saying that having things in dma_fence . rs is the solution, although it is definitely one solution. But the argument above isn’t very strong IMHO. > > And a non-common case will have to be implemented in the driver > anyways, so we'd have to allow for that. > > > > P.

