Re: dma_fence: force users to take the lock manually

Philipp Stanner Fri, 06 Mar 2026 11:03:13 -0800

On Fri, 2026-03-06 at 16:43 +0100, Boris Brezillon wrote:
> On Fri, 6 Mar 2026 16:25:25 +0100
> Boris Brezillon <[email protected]> wrote:
> 
> > On Fri, 6 Mar 2026 15:37:31 +0100
> > Christian König <[email protected]> wrote:
> > 
> > > On 3/6/26 14:36, Philipp Stanner wrote:  
> > > > > > > > In other words not signaling a fence can leave the system in a
> > > > > > > > deadlock state, but signaling it incorrectly usually results in
> > > > > > > > random data corruption.    
> > > > > > 
> > > > > > Well, not signaling it results in a potential deadlock of the
> > > > > > whole kernel, whereas wrongly signaling it is "only" a functional
> > > > > > bug.    
> > > > > 
> > > > > No, that results in random memory corruption. Which is easily a
> > > > > magnitude worse than just a kernel deadlock.
> > > > > 
> > > > > When have seen such bugs numerous times with suspend and resume on
> > > > > laptops in different subsystems, e.g. not only GPU. And I'm
> > > > > absolutely clearly rejecting any attempt to signal DMA fences when
> > > > > an object runs out of scope because of that experience.    
> > > > 
> > > > But you're aware that both in C and Rust you could experience UAF
> > > > bugs if fences drop unsignaled and the driver unloads?
> > > > 
> > > > Though I tentatively agree that memory corruptions on a large scale
> > > > are probably the worst error you can have on a computer.    
> > > 
> > > Yeah, of course I'm aware of the UAF issue we have.
> > > 
> > > But those are relatively harmless compared to the random memory
> > > corruption issues.
> > > 
> > > Linux has the unfortunate habit of re-using memory directly after
> > > freeing it because that means caches are usually hotter.
> > > 
> > > So rough DMA operations have the tendency to end up anywhere and
> > > tools like KASAN can't find anything wrong.
> > > 
> > > The only protection you sometimes have is IOMMU, but that is usually
> > > not able to catch everything either.
> > >   
> > > > >    
> > > > > > > It all stands and falls with the question whether a fence can
> > > > > > > drop by accident in Rust, or if it will only ever drop when the
> > > > > > > hw-ring is closed.
> > > > > > > 
> > > > > > > What do you believe is the right thing to do when a driver
> > > > > > > unloads?    
> > > > > > 
> > > > > > The fence has to be signaled -- ideally after shutting down all
> > > > > > queues, but it has to be signaled.    
> > > > > 
> > > > > Yeah well this shutting down all queues (and making sure that no
> > > > > write operation is pending in caches etc...) is what people
> > > > > usually don't get right.
> > > > > 
> > > > > What you can to is things like setting your timeout to zero and
> > > > > immediately causing terminating the HW operation and resetting the
> > > > > device.
> > > > > 
> > > > > This will then use the same code path as the mandatory timeout
> > > > > functionality for DMA operations and that usually works reliable.    
> > > > 
> > > > Why is all that even an issue? The driver controls the hardware and
> > > > must "switch it off" before it unloads. Then the hardware will not
> > > > access any memory anymore for sure.    
> > > 
> > > Well exactly that is usually really complicated. Let me try to
> > > explain that on a HW example. 
> > > 
> > > Between a shader and the actual system memory you usually have
> > > different IP blocks or stages where a memory access needs to go
> > > through:
> > > 
> > > Shader -> device VM -> device cache -> PCI bus -> CPU cache -> memory
> > > 
> > > Now when you want to terminate some shader or make some memory
> > > inaccessible because it is freed drivers update their page tables and
> > > issue the equivalent of TLB invalidation on a CPU.
> > > 
> > > The problem is now that this will only invalidate the translation in
> > > the device VM. It doesn't affect the device cache nor any ongoing
> > > memory transaction on the bus which waits to snoop the CPU cache.
> > > 
> > > To make sure that you don't corrupt system memory you actually need
> > > to wait for a cache flush event to be signaled and *not* just update
> > > the VM page tables and tell the HW to invalidate it's TLB.
> > > 
> > > So what is needed is usually a fence operation. In other words a
> > > memory value written over the PCIe bus into system memory. Background
> > > is that memory writes are ordered and this one comes after all
> > > previous PCIe memory writes of the device and so is in the correct
> > > order.
> > > 
> > > Only when the CPU sees this memory write you can be sure that your
> > > operation is completed.
> > > 
> > > This memory write is then often implemented by using an MSI interrupt
> > > which in turn signals the DMA fence.
> > > 
> > > 
> > > So the right thing to do is to wait for the DMA fence to signal
> > > through its normal signaling path which includes both HW and SW
> > > functionality and *not* just tell the HW to stop some ring and then
> > > just assume that this is also sufficient to signal the DMA fence
> > > associated with the HW operation.  
> > 
> > Ultimately this
> > "stop-HW-and-make-sure-all-outcomes-are-visible-even-for-partially-executed-jobs"
> > is something you'll have to do, no matter what. But it leading to
> > having to wait for each pending fence, I'm not too sure. What about the
> > case where jobs/ops further away in the HWRing were not even considered
> > for execution by the HW, because the STOP operation prevented them from
> > being dequeued. I'd expect that the only event we'll get for those is
> > "HW queue is properly stopped now". So at this point it's a matter of
> > signalling everything that's left, no? I mean, that's basically what
> > Panthor does:
> > 
> > 1. it stops
> > 2. wait for all executing ops to land (with all the cache maintenance,
> > etc, you described)
> > 3. signal(ECANCELED) what's left (things not picked by the HW by
> > the time the STOP was effective).
> > 
> > It's currently done manually, but does it have to?
> 
> All this being said, I'm also a pragmatic guy, so if you tell us "no
> way!" even after these arguments, I'd rather give up on this
> auto-signaling feature and have rust drivers be forced to manually
> signal stuff than have the whole Fence abstraction blocked. We can add
> a warn_on!(!is_signaled()) in the DriverFence::drop() path for the time
> being, so we can at least catch cases where the driver didn't signal
> the fence before dropping the signal-able object.



In past discussions with my team members we were also not very
determined whether we want autosignal or not.

The only thing I'm concerned about is the RCU vs. UAF feature. We
already invested a lot of time and pain into that feature, so we most
probably want it.


P.

Re: dma_fence: force users to take the lock manually

Reply via email to