On Fri, 06 Mar 2026 13:36:48 +0100 Philipp Stanner <[email protected]> wrote:
> On Fri, 2026-03-06 at 13:31 +0100, Christian König wrote: > > On 3/6/26 12:57, Philipp Stanner wrote: > > > On Fri, 2026-03-06 at 12:24 +0100, Boris Brezillon wrote: > > > > On Fri, 6 Mar 2026 12:03:19 +0100 > > > > Christian König <[email protected]> wrote: > > > > > > > > > On 3/6/26 11:37, Boris Brezillon wrote: > > > > > > On Fri, 6 Mar 2026 10:58:07 +0100 > > > > > > Christian König <[email protected]> wrote: > > > > > > > > > > > > > On 3/6/26 10:46, Boris Brezillon wrote: > > > > > > > > On Fri, 6 Mar 2026 09:10:52 +0100 > > > > > > > > Christian König <[email protected]> wrote: > > > > > > > > > Well as I wrote above you either have super reliable locking > > > > > > > > > in > > > > > > > > > your signaling path or you will need that for error handling. > > > > > > > > > > > > > > > > > > > > > > > > > Not really. With rust's ownership model, you can make it so only > > > > > > > > one thread gets to own the DriverFence (the signal-able fence > > > > > > > > object), and the DriverFence::signal() method consumes this > > > > > > > > object. This implies that only one path gets to signal the > > > > > > > > DriverFence, and after that it vanishes, so no one else can > > > > > > > > signal it anymore. Just to clarify, by vanishes, I mean that the > > > > > > > > signal-able view disappears, but the observable object (Fence) > > > > > > > > can stay around, so it can be monitored (and only monitored) by > > > > > > > > others. With this model, it doesn't matter that _set_error() is > > > > > > > > set under a dma_fence locked section or not, because the > > > > > > > > concurrency is addressed at a higher level. > > > > > > > > > > > > > > That whole approach won't work. You have at least the IRQ handler > > > > > > > which signals completion and the timeout handler which signals > > > > > > > completion with an error. > > > > > > > > > > > > From a pure rust standpoint, and assuming both path (IRQ handler and > > > > > > timeout handler) are written in rust, the compiler won't let you > > > > > > signal concurrently if we design the thing properly, that's what > > > > > > I'm trying to say. Just to be clear, it doesn't mean you can't have > > > > > > one worker (in a workqueue context) that can signal a fence and an > > > > > > IRQ handler that can signal the same fence. It just means that rust > > > > > > won't let you do that unless you have proper locking in place, and > > > > > > rust will also guarantee you won't be able to signal a fence that > > > > > > has already been signaled, because as soon as it's signaled, the > > > > > > signal-able fence should be consumed. > > > > > > > > > > Ah got it! I've worked a lot with OCaml in the past which has some > > > > > similarities, but doesn't push things that far. > > > > > > > > > > > > > > > > > > > We have documented that this handling is mandatory for DMA-fences > > > > > > > since so many driver implementations got it wrong. > > > > > > > > > > > > Again, I'm just talking about the rust implementation we're aiming > > > > > > for. If you start mixing C and rust in the same driver, you're back > > > > > > to the original problem you described. > > > > > > > > > > The key point is the Rust implementation should not repeat the > > > > > mistakes we made in the C implementation. > > > > > > > > > > For example blocking that multiple threads can't signal a DMA-fence > > > > > is completely irrelevant. > > > > > > > > From a correctness standpoint, I think it's important to ensure no more > > > > than one thread gets to signal the object. > > > > > > If you have two paths that can signal a fence, that will result > > > effectively in you in Rust having to use yet another lock for a fence, > > > and likely some mechanism for revoking the access. > > > > > > I would at least consider whether it isn't much easier to have the > > > signalling-function ignore multiple signal attempts. > > > > > > AFAIU in Rust we originaly ended up at signal() consuming the fence > > > because of the code UAF problem with data: T. > > > > +1 > > > > > > > > > > > > What we need to guarantee is correct timeout handling and that > > > > > DMA-fence can only signal from something delivered from a HW event, > > > > > e.g. a HW interrupt or interrupt worker or similar. > > > > > > > > We've mostly focused on coming up with a solution that would annotate > > > > signaling paths in an automated way, and making sure dma_fence_signal() > > > > is never called outside of a non-annotated path: > > > > - creation of DmaFenceWorkqueue/DmaFence[Delayed]Work that guarantees > > > > all works are executed in a dma_fence_signalling_{begin,end}() > > > > section, so we can properly detect deadlocks (through lockdep) > > > > - creation of a DmaFenceIrqHandler for the same reason > > > > - we'll need variants for each new deferred mechanism drivers might > > > > want to use (kthread_worker?) > > > > > > > > But there's currently no restriction on calling dma_fence_signal() in a > > > > user thread context (IOCTL()). I guess that shouldn't be too hard to > > > > add (is_user_task() to the rescue). > > > > > > > > > > > > > > A DMA-fence should *never* signal because of an IOCTL > > > > > > > > Okay, that's understandable. > > > > > > > > > or because some > > > > > object runs out of scope. E.g. when you cleanup a HW ring buffer, FW > > > > > queue, etc... > > > > > > > > We were actually going in the opposite direction: > > > > auto-signal(ECANCELED) on DriverFenceTimeline object destruction > > > > Absolutely clear NAK to that, we have iterated that many times before on > > the C side as well. > > > > See below for the explanation of the background. > > > > > > (which > > > > is the thing that would be attached to the HW ringbuf. The reason is: > > > > we don't want to leave unsignalled fences behind, > > > > > > > > > > Not only do we not "want to", we actually *cannot*. We have to make > > > sure all fences are signaled because only this way the C backend plus > > > RCU can protect also the Rust code against UAF. > > > > > > > and if the HW ring is > > > > gone, there's nothing that can signal it. Mind explaining why you think > > > > this shouldn't be done, because I originally interpreted your > > > > suggestion as exactly the opposite. > > > > > > I also don't get it. All fences must always get signaled, that's one of > > > the most fundamental fence rules. Thus, if the last accessor to a fence > > > drops, you do want to signal it with -ECANCELED > > > > All fences must always signal because the HW operation must always complete > > or be terminated by a timeout. > > > > If a fence signals only because it runs out of scope than that means that > > you have a huge potential for data corruption and that is even worse than > > not signaling a fence. > > > > In other words not signaling a fence can leave the system in a deadlock > > state, but signaling it incorrectly usually results in random data > > corruption. Forcing a manual signal doesn't really solve the problem, does it? I mean, you can hand-roll your "signal(ECANCELED) all fences on this HW ring on HW ring desctruction" (you'll have to if it's not managed by something else), but how safer is it than providing a DriverDmaFenceTimeline (or DriverDmaFenceContext, call it what you want) that manages that for you, and then have this object attached to your HW ring, and on HW ring drop, you get DriverDmaFenceContext dropped too, and the associated auto-signal(ECANCELED) called. Ultimately, the UAF you're referring to still exists, and I agree it's worse than a deadlock, but it's also not something we're magically immune to just because we don't signal on ::drop(). If you think that's preferable, we can have that done in some ::signal_all() method that has to be explicitly called when the HW ring is destroyed, but that's basically the same problem: you have no guarantee that no other paths will call that while the HW is still active... > > It all stands and falls with the question whether a fence can drop by > accident in Rust, or if it will only ever drop when the hw-ring is > closed. Let's say it's less likely to happen, not impossible. And if it does happen, it means the user really wanted that to happen (thinking of ManuallyDrop for instance).
