On Tue, Dec 30, 2025 at 04:42:51PM +0100, Marco Crivellari wrote: > On Wed, Dec 24, 2025 at 7:28 PM Matthew Brost <[email protected]> wrote: > > [...] > > > drivers/gpu/drm/xe/xe_tlb_inval.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c > > > b/drivers/gpu/drm/xe/xe_tlb_inval.c > > > index 918a59e686ea..b2cf6e17fbc5 100644 > > > --- a/drivers/gpu/drm/xe/xe_tlb_inval.c > > > +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c > > > @@ -94,7 +94,7 @@ static void xe_tlb_inval_fence_timeout(struct > > > work_struct *work) > > > xe_tlb_inval_fence_signal(fence); > > > } > > > if (!list_empty(&tlb_inval->pending_fences)) > > > - queue_delayed_work(system_wq, &tlb_inval->fence_tdr, > > > + queue_delayed_work(system_percpu_wq, &tlb_inval->fence_tdr, > > > > Actually system_wq or system_percpu_wq doesn't work here as this is the > > fence signaling path. We should use one Xe's ordered work queues which > > is properly setup to be reclaim same. > > Hi, > > So only for this specific workqueue we should use for example this, instead: > > 462 /** @ordered_wq: used to serialize compute mode resume */ > 463 struct workqueue_struct *ordered_wq; > > I noticed this has been allocated using: > > 490 xe->ordered_wq = alloc_ordered_workqueue("xe-ordered-wq", 0); > > Using alloc_ordered_workqueue() makes this workqueue to be unbound. > > 569 #define alloc_ordered_workqueue(fmt, flags, args...) \ > 570 alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args) > > So this patch should be split in 2: > - 1 patch with Xe's ordered workqueue, changing the behavior to > unbound "implicitly" > - 1 patch changing system_wq with the new per-cpu wq (system_percpu_wq). > > To keep this workqueue per-cpu we can use xe->unordered_wq, that makes use of > alloc_workqueue() without specifying flags (eg. WQ_UNBOUND or the new > WQ_PERCPU), > making this workqueue per-cpu. > > Would this sound reasonable to you? >
What I'd probably do here is store a WQ pointer in 'struct xe_tlb_inval' and update all calls in xe_tlb_inval to use that work queue. Since these are tied to a GT currently - see xe_gt_tlb_inval_init_early, I'd set the WQ pointer in 'struct xe_tlb_inval' to 'gt->ordered_wq'. 'gt->ordered_wq' is the per-GT WQ for resets, jobs timeouts, VF migrations (all operations in reclaim path), so this seems to fit here as well. Matt > Thanks! > -- > > Marco Crivellari > > L3 Support Engineer
