On 3/17/26 16:21, Boris Brezillon wrote: > On Tue, 17 Mar 2026 15:48:25 +0100 > "Christian König" <[email protected]> wrote: > >> In case of a refcounting bug dma_fence_release() can be called before the >> fence was even signaled. >> >> Previously the dma_fence framework then force signaled the fence to make >> sure to unblock waiters, but that can potentially lead to random memory >> corruption when the DMA operation continues. So be more defensive here and >> pick the lesser evil. >> >> Instead of force signaling the fence set an error code on the fence, >> re-initialize the refcount to something large and taint the kernel. >> >> This will leak memory and eventually can cause a deadlock when the fence >> is never signaled, but at least we won't run into an use after free or >> random memory corruption. >> >> Signed-off-by: Christian König <[email protected]> >> --- >> drivers/dma-buf/dma-fence.c | 18 ++++++++++++++---- >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c >> index 1826ba73094c..8bf07685a053 100644 >> --- a/drivers/dma-buf/dma-fence.c >> +++ b/drivers/dma-buf/dma-fence.c >> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref) >> /* >> * Failed to signal before release, likely a refcounting issue. >> * >> - * This should never happen, but if it does make sure that we >> - * don't leave chains dangling. We set the error flag first >> - * so that the callbacks know this signal is due to an error. >> + * This should never happen, but if try to be defensive and take >> + * the lesser evil. Initialize the refcount to something large, >> + * but not so large that it can overflow. >> + * >> + * That will leak memory and could deadlock if the fence never >> + * signals, but at least it doesn't cause an use after free or >> + * random memory corruption. >> + * >> + * Also taint the kernel to note that it is rather unreliable to >> + * continue. >> */ >> dma_fence_lock_irqsave(fence, flags); >> fence->error = -EDEADLK; >> - dma_fence_signal_locked(fence); >> + refcount_set(&fence->refcount.refcount, INT_MAX); > > I'm not convinced this is useful. If we leak the object, no one should > have a ref to release anyway. This does raise a question though. The > case we're trying to protect against is fence_callback being registered > to this fence and waiting for an event to signal another proxy fence.
Not quite. The real problematic case is that it is necessary to wait for a fence to signal with tons of memory management locks held. So it can be that a simple memory allocation cycles back and depends on the fence to signal. > How can the refcnt drop to zero in that case? Isn't the proxy supposed > to own a ref on the fence. Before we go further, I'd like to understand > what we're trying to do. Well we are in C here, so its simply coding errors. An unecessary dma_fence_put() in an error path is enough to trigger this. > The original discussion that led you to write this patch was about > detecting when a fence emitter/producer would leave unsignalled fences > behind, and the problem we have is when such unsignalled fences have > observers waiting for a "signalled" event. If the refcnt drops to zero > and the fence is released, we're already passed that point, > unfortunately. Well that is not quite correct. The most common problem is that we have unbalanced dma_fence_get()/dma_fence_put() and we end up in dma_fence_release() before the issuer of the dma_fence has a chance to signal it. See the main purpose of DMA fences is to prevent releasing memory back into the core memory management before the DMA operation is completed. So when a DMA fence signals to early it means that the HW is still writing to that memory but we already potentially re-using the memory ending in random memory corruption. UAF issues are harmless compared to that. Regards, Christian. > It can be that: > > - the fence was never exposed -> this is fine > - the fence was exposed but never observed -> this is broken, because if > it had been observed it would have led to a deadlock > - the fence was exposed, observed for some time, but the observer got > bored, stopped waiting and: > * decided to go and execute its stuff anyway -> use-before-ready > situation > * gave up -> kinda okay, but we should still consider the fence > emitter broken > - the fence observer registered a callback but didn't take a ref on the > object -> this is potential UAF on the dma_fence, which can also lead > to a VRAM/system-mem UAF if the emitter drops the dma_fence without > signalling, because of the auto-signal you're getting rid of in this > patch. But the latter is just a side effect of the dma_fence UAF, > which I'm not convinced we should try to protect against. > >> dma_fence_unlock_irqrestore(fence, flags); >> + rcu_read_unlock(); >> + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); >> + return; >> } >> >> ops = rcu_dereference(fence->ops); >
