On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote:
> On 2/17/26 15:09, Alice Ryhl wrote:
> > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <[email protected]> wrote:
> > > > > > 
> > > > > > 

[…]

> > > > > > Thinking more about it you should probably enforce that there is 
> > > > > > only
> > > > > > one signaling path for each fence signaling.
> > > > > 
> > > > > I'm not really convinced by this.
> > > > > 
> > > > > First, the timeout path must be a fence signalling path because the
> > > > > reason you have a timeout in the first place is because the hw might
> > > > > never signal the fence. So if the timeout path deadlocks on a
> > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, 
> > > > > boom.
> > > > 
> > > > Mhm, good point. On the other hand the timeout handling should probably 
> > > > be considered part of the normal signaling path.
> > > 
> > > 
> > > Why would anyone want to allocate in a timeout path in the first place – 
> > > especially for jobqueue?
> > > 
> > > Timeout -> close the associated ring. Done.
> > > JobQueue will signal the done_fences with -ECANCELED.
> > > 
> > > What would the driver want to allocate in its timeout path, i.e.: timeout 
> > > callback.
> > 
> > Maybe you need an allocation to hold the struct delayed_work_struct
> > field that you use to enqueue the timeout?
> 
> And the workqueue were you schedule the delayed_work on must have the reclaim 
> bit set.
> 
> Otherwise it can be that the workqueue finds all kthreads busy and tries to 
> start a new one, e.g. allocating task structure......

OK, maybe I'm lost, but what delayed_work?

The jobqueue's delayed work item gets either created on JQ::new() or in
jq.submit_job(). Why would anyone – that is: any driver – implement a
delayed work in its timeout callback?

That doesn't make sense.

JQ notifies the driver from its delayed_work through
timeout_callback(), and in that callback the driver closes the
associated firmware ring.

And it drops the JQ. So it is gone. A new JQ will get a new timeout
work item.

That's basically all the driver must ever do. Maybe some logging and
stuff.

With firmware scheduling it should really be that simple.

And signalling / notifying userspace gets done by jobqueue.

Right?

> 
> You also potentially want device core dumps. Those usually use GFP_NOWAIT so 
> that they can't cycle back and wait for some fence. The down side is that 
> they can trivially fail under even light memory pressure.

Simply logging into dmesg should do the trick, shouldn't it?


P.

Reply via email to