On Fri, 6 Mar 2026 16:25:25 +0100
Boris Brezillon <[email protected]> wrote:

> On Fri, 6 Mar 2026 15:37:31 +0100
> Christian König <[email protected]> wrote:
> 
> > On 3/6/26 14:36, Philipp Stanner wrote:  
> > >>>>> In other words not signaling a fence can leave the system in a
> > >>>>> deadlock state, but signaling it incorrectly usually results in
> > >>>>> random data corruption.    
> > >>>
> > >>> Well, not signaling it results in a potential deadlock of the
> > >>> whole kernel, whereas wrongly signaling it is "only" a functional
> > >>> bug.    
> > >>
> > >> No, that results in random memory corruption. Which is easily a
> > >> magnitude worse than just a kernel deadlock.
> > >>
> > >> When have seen such bugs numerous times with suspend and resume on
> > >> laptops in different subsystems, e.g. not only GPU. And I'm
> > >> absolutely clearly rejecting any attempt to signal DMA fences when
> > >> an object runs out of scope because of that experience.    
> > > 
> > > But you're aware that both in C and Rust you could experience UAF
> > > bugs if fences drop unsignaled and the driver unloads?
> > > 
> > > Though I tentatively agree that memory corruptions on a large scale
> > > are probably the worst error you can have on a computer.    
> > 
> > Yeah, of course I'm aware of the UAF issue we have.
> > 
> > But those are relatively harmless compared to the random memory
> > corruption issues.
> > 
> > Linux has the unfortunate habit of re-using memory directly after
> > freeing it because that means caches are usually hotter.
> > 
> > So rough DMA operations have the tendency to end up anywhere and
> > tools like KASAN can't find anything wrong.
> > 
> > The only protection you sometimes have is IOMMU, but that is usually
> > not able to catch everything either.
> >   
> > >>    
> > >>>> It all stands and falls with the question whether a fence can
> > >>>> drop by accident in Rust, or if it will only ever drop when the
> > >>>> hw-ring is closed.
> > >>>>
> > >>>> What do you believe is the right thing to do when a driver
> > >>>> unloads?    
> > >>>
> > >>> The fence has to be signaled -- ideally after shutting down all
> > >>> queues, but it has to be signaled.    
> > >>
> > >> Yeah well this shutting down all queues (and making sure that no
> > >> write operation is pending in caches etc...) is what people
> > >> usually don't get right.
> > >>
> > >> What you can to is things like setting your timeout to zero and
> > >> immediately causing terminating the HW operation and resetting the
> > >> device.
> > >>
> > >> This will then use the same code path as the mandatory timeout
> > >> functionality for DMA operations and that usually works reliable.    
> > > 
> > > Why is all that even an issue? The driver controls the hardware and
> > > must "switch it off" before it unloads. Then the hardware will not
> > > access any memory anymore for sure.    
> > 
> > Well exactly that is usually really complicated. Let me try to
> > explain that on a HW example. 
> > 
> > Between a shader and the actual system memory you usually have
> > different IP blocks or stages where a memory access needs to go
> > through:
> > 
> > Shader -> device VM -> device cache -> PCI bus -> CPU cache -> memory
> > 
> > Now when you want to terminate some shader or make some memory
> > inaccessible because it is freed drivers update their page tables and
> > issue the equivalent of TLB invalidation on a CPU.
> > 
> > The problem is now that this will only invalidate the translation in
> > the device VM. It doesn't affect the device cache nor any ongoing
> > memory transaction on the bus which waits to snoop the CPU cache.
> > 
> > To make sure that you don't corrupt system memory you actually need
> > to wait for a cache flush event to be signaled and *not* just update
> > the VM page tables and tell the HW to invalidate it's TLB.
> > 
> > So what is needed is usually a fence operation. In other words a
> > memory value written over the PCIe bus into system memory. Background
> > is that memory writes are ordered and this one comes after all
> > previous PCIe memory writes of the device and so is in the correct
> > order.
> > 
> > Only when the CPU sees this memory write you can be sure that your
> > operation is completed.
> > 
> > This memory write is then often implemented by using an MSI interrupt
> > which in turn signals the DMA fence.
> > 
> > 
> > So the right thing to do is to wait for the DMA fence to signal
> > through its normal signaling path which includes both HW and SW
> > functionality and *not* just tell the HW to stop some ring and then
> > just assume that this is also sufficient to signal the DMA fence
> > associated with the HW operation.  
> 
> Ultimately this
> "stop-HW-and-make-sure-all-outcomes-are-visible-even-for-partially-executed-jobs"
> is something you'll have to do, no matter what. But it leading to
> having to wait for each pending fence, I'm not too sure. What about the
> case where jobs/ops further away in the HWRing were not even considered
> for execution by the HW, because the STOP operation prevented them from
> being dequeued. I'd expect that the only event we'll get for those is
> "HW queue is properly stopped now". So at this point it's a matter of
> signalling everything that's left, no? I mean, that's basically what
> Panthor does:
> 
> 1. it stops
> 2. wait for all executing ops to land (with all the cache maintenance,
> etc, you described)
> 3. signal(ECANCELED) what's left (things not picked by the HW by
> the time the STOP was effective).
> 
> It's currently done manually, but does it have to?

All this being said, I'm also a pragmatic guy, so if you tell us "no
way!" even after these arguments, I'd rather give up on this
auto-signaling feature and have rust drivers be forced to manually
signal stuff than have the whole Fence abstraction blocked. We can add
a warn_on!(!is_signaled()) in the DriverFence::drop() path for the time
being, so we can at least catch cases where the driver didn't signal
the fence before dropping the signal-able object.

Reply via email to