On 2/10/26 11:36, Danilo Krummrich wrote: > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote: >> One way you can see this is by looking at what we require of the >> workqueue. For all this to work, it's pretty important that we never >> schedule anything on the workqueue that's not signalling safe, since >> otherwise you could have a deadlock where the workqueue is executes some >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence, >> meaning that the VM_BIND job never gets scheduled since the workqueue >> is never freed up. Deadlock. > > Yes, I also pointed this out multiple times in the past in the context of C > GPU > scheduler discussions. It really depends on the workqueue and how it is used. > > In the C GPU scheduler the driver can pass its own workqueue to the scheduler, > which means that the driver has to ensure that at least one out of the > wq->max_active works is free for the scheduler to make progress on the > scheduler's run and free job work. > > Or in other words, there must be no more than wq->max_active - 1 works that > execute code violating the DMA fence signalling rules.
*And* the workqueue must be created with WQ_MEM_RECLAIM so that work items can also start under memory pressure and not potentially cycle back into the memory management to wait for a dma_fence to signal. But apart from that your explanation is perfectly correct, yes. Thanks, Christian. > This is also why the JobQ needs its own workqueue and relying on the system WQ > is unsound. > > In case of an ordered workqueue, it is always a potential deadlock to schedule > work that does non-atomic allocations or takes a lock that is used elsewhere > for > non-atomic allocations of course.
