Hi Philipp,

On Tue, 2026-06-23 at 13:58 +0200, Philipp Stanner wrote:
> On Tue, 2026-06-23 at 12:37 +0100, André Draszik wrote:
> > On Thu, 2026-06-18 at 17:56 +0200, Philipp Stanner wrote:
> > > 
> > > I continue to believe because of bugs like this and the ones I have
> > > quoted in the threads above the robustness of the kernel could be
> > > greatly improved if we could get dma_fence fully synchronized with its
> > > lock.
> > 
> > On top of that, sashiko highlighted  (via my other patch) that the existing
> > code is missing some memory barriers:
> > 
> > https://sashiko.dev/#/patchset/[email protected]?part=1
> > 
> > I believe Lock synchronization would resolve that (as would adding explicit
> > memory barriers).
> 
> That is being discussed in the thread I linked, where Gary lists which
> barriers you would need for (presumably correct) lockless magic.

Having read Gary's suggestion, that aligns with what I had in mind.

> However, if my issue were to be solved with barriers, the
> test_and_set_bit() in dma_fence_signal_timestamp_locked() would have to
> be replaced with the more weakly ordered test_bit() and set_bit(),
> maybe creating other pitfalls.

For the avoidance of doubts, I'm not saying that all the issues you raised
can be solved by barriers instead of appropriate locks (I don't know enough
about the code and issues in general here).

I do think however that appropriate locks will fix the ordering issue
highlighted by sashiko (i.e. +1 for your argument). Barriers would fix this
specific issue, too, but that is not a statement about any wider issues. 

> The ordering issue in the get_*_name() functions plays into that.
> Setting the bit would then be done after setting the ops-pointer to
> NULL. So one would have to try to move the NULL set, too.
> 
> Long story short, this is painful and subtle.
> 
> But I think what we are realizing over and over again is that dma_fence
> has many subtleties to its API contract, and the implementation's
> sparring use of spinlocks leads to workarounds where people take locks
> manually or have to do an RCU dance.
> 
> Note that Christian is strongly opposed to guarding everything with
> locks, in part for supposedly occuring deadlocks in the fence callbacks
> when the driver needs to take its own locks.

ww_mutex could help against deadlocks, but might affect performance, in case
these are all critical code paths (IDK),

> The community discussion regarding that problem is currently in some
> sort of dead end, where none of us seems to know what the correct path
> forward is.

Please ignore if the following doesn't make sense, I'm just a bystander :-)
How about at least adding the required barriers and related changes, and
taking it from there? This would solve some immediate and easy to hit
issues on Arm64? If they turn out to be insufficient, code can still
be changed.



> > > 
> [...]
> My understanding of the current situation is that as an issuer of
> dma_fence's you, in general, should wait for a grace period until you
> perform operations like driver unload, or, more generally, have fence-
> related resources and such being accessed through callbacks go away.

If I understand correctly, simply waiting for a grace period in the
driver's unbind should be the way to go.


> Danilo ... Maybe he's got the time to share some details with you that are
> relevant to your work.

Will wait a little :-)



BTW, thanks Philipp for all these details, much appreciated.

Cheers,
A.

Reply via email to