On Fri Mar 6, 2026 at 1:36 PM CET, Philipp Stanner wrote: > On Fri, 2026-03-06 at 13:31 +0100, Christian König wrote: >> All fences must always signal because the HW operation must always complete >> or be terminated by a timeout. >> >> If a fence signals only because it runs out of scope than that means that you >> have a huge potential for data corruption and that is even worse than not >> signaling a fence.
If that happens, it is a functional bug, the potential data corruption is only within a separate memory object, e.g. GEM etc., no? I.e. it may fault the GPU, but it does not fault the kernel. >> In other words not signaling a fence can leave the system in a deadlock >> state, but signaling it incorrectly usually results in random data >> corruption. Well, not signaling it results in a potential deadlock of the whole kernel, whereas wrongly signaling it is "only" a functional bug. > It all stands and falls with the question whether a fence can drop by > accident in Rust, or if it will only ever drop when the hw-ring is > closed. > > What do you believe is the right thing to do when a driver unloads? The fence has to be signaled -- ideally after shutting down all queues, but it has to be signaled. > Ideally we could design it in a way that the driver closes its rings, > the pending fences drop and get signaled with ECANCELED. > > Your concern seems to be a driver by accident droping a fence while the > hardware is still processing the associated job. I'm not concerned about the "driver drops fence by accident" case, as it is less problematic than the "driver forgets to signal the fence" case. One is a logic bug, whereas the other can deadlock the kernel, i.e. it is unsafe in terms of Rust. (Technically, there are subsequent problems to solve, as core::mem::forget() is safe and would cause the same problem. However, this is not new, it applies to lock guards in general. We can catch such things with klint though.) Ultimately, a DMA fence (that has been exposed to the outside world) is technically equivalent to a lock guard. > (how's that dangerous, though? Shouldn't parties waiting for the fence > detect the error? ECANCELED ⇒ you must not access the associated > memory)
