On Fri, 6 Mar 2026 09:10:52 +0100 Christian König <[email protected]> wrote:
> On 3/5/26 16:12, Boris Brezillon wrote: > > Hi, > > > > On Thu, 5 Mar 2026 14:59:02 +0100 > > Christian König <[email protected]> wrote: > > > >> On 3/5/26 14:54, Philipp Stanner wrote: > >>> Yo Christian, > >>> > >>> a while ago we were discussing this problem > >>> > >>> dma_fence_set_error(f, -ECANCELED); > > > > If you really have two concurrent threads setting the error, this part > > is racy, though I can't think of any situation where concurrent > > signaling of a set of fences wouldn't be protected by another external > > lock. > > This is actually massively problematic and the reason why we have the WARN_ON > in dma_fence_set_error(). > > What drivers usually do is to disable the normal signaling path, e.g. turn > off interrupts for example, and then set and error and signal the fence > manually. > > The problem is that this has a *huge* potential for being racy, for example > when you tell the HW to not give you an interrupt any more it can always been > than interrupt processing has already started but wasn't able yet to grab a > lock or similar. > > I think we should start enforcing correct handling and have a lockdep check > in dma_fence_set_error() that the dma_fence lock is hold while calling it. Sure, I don't mind you dropping the non-locked variants and forcing users to lock around set_error() + signal(). > > >>> dma_fence_signal(f); // racy! > > > > This is not racy because dma_fence_signal() takes/releases the > > lock internally. Besides, calling dma_fence_signal() on an already > > signaled fence is considered an invalid pattern if I trust the -EINVAL > > returned here[1]. > > No, that is also something we want to remove. IIRC Philip proposed some > patches to clean that up already. What do you mean? You want dma_fence_signal_locked() (or the variants of it) to not return an error when the fence is already signaled, or you want to prevent this double-signal from happening. The plan for the rust abstraction is to do the latter. > > >>> > >>> > >>> I think you mentioned that you are considering to redesign the > >>> dma_fence API so that users have to take the lock themselves to touch > >>> the fence: > >>> > >>> dma_fence_lock(f); > >>> dma_fence_set_error(f, -ECANCELED); > >>> dma_fence_signal(f); > > > > I guess you mean dma_fence_signal_locked(). > > > >>> dme_fence_unlock(f); > >>> > >>> > >>> Is that still up to date? Is there work in progress about that? > >> > >> It's on my "maybe if I ever have time for that" list, but yeah I think it > >> would be really nice to have and a great cleanup. > >> > >> We have a bunch of different functions which provide both a _locked() and > >> _unlocked() variant just because callers where to lazy to lock the fence. > >> > >> Especially the dma_fence_signal function is overloaded 4 (!) times with > >> locked/unlocked and with and without timestamp functions. > >> > >>> I discovered that I might need / want that for the Rust abstractions. > >> > >> Well my educated guess is for Rust you only want the locked function and > >> never allow callers to be lazy. > > > > I don't think we have an immediate need for manual locking in rust > > drivers (no signaling done under an already dma_fence-locked section > > that I can think of), especially after the inline_lock you've > > introduced. Now, I don't think it matters if only the _locked() variant > > is exposed and the rust code is expected to acquire/release the lock > > manually, all I'm saying is that we probably don't need that in drivers > > (might be different if we start implementing fence containers like > > arrays and chain in rust, but I don't think we have an immediate need > > for that). > > Well as I wrote above you either have super reliable locking in your > signaling path or you will need that for error handling. Not really. With rust's ownership model, you can make it so only one thread gets to own the DriverFence (the signal-able fence object), and the DriverFence::signal() method consumes this object. This implies that only one path gets to signal the DriverFence, and after that it vanishes, so no one else can signal it anymore. Just to clarify, by vanishes, I mean that the signal-able view disappears, but the observable object (Fence) can stay around, so it can be monitored (and only monitored) by others. With this model, it doesn't matter that _set_error() is set under a dma_fence locked section or not, because the concurrency is addressed at a higher level. Again, I'm not saying the changes Christian and you have been discussing are pointless (they might help with the C implementations to get things right), I'm just saying it's not strictly needed for the rust abstraction, that's all.
