Hey,
Den 2026-02-26 kl. 15:38, skrev Sebastian Andrzej Siewior:
> On 2026-02-26 15:19:42 [+0100], To Maarten Lankhorst wrote:
>> On 2026-02-26 13:07:18 [+0100], To Maarten Lankhorst wrote:
>>> series somewhere I could pull and check. In meantime I would look what
>>> causes the lockup on i915.
>>
>> I think I got it.
>
> This the atomic sync as-is, IRQ-Work (FIFO-1) will be preempted by the
> threaded-interrupt (FIFO-50) and the interrupt will poll on
> signaler_active while the irq-work can't make progress.
>
> This will provide the needed sync:
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index a2b413982ce64..337f6e88faf05 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -209,6 +209,7 @@ static void signal_irq_work(struct irq_work *work)
> intel_breadcrumbs_disarm_irq(b);
>
> rcu_read_lock();
> + spin_lock(&b->signaler_active_sync);
> atomic_inc(&b->signaler_active);
> list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
> struct i915_request *rq;
> @@ -246,6 +247,7 @@ static void signal_irq_work(struct irq_work *work)
> }
> }
> atomic_dec(&b->signaler_active);
> + spin_unlock(&b->signaler_active_sync);
> rcu_read_unlock();
>
> llist_for_each_safe(signal, sn, signal) {
> @@ -290,6 +292,7 @@ intel_breadcrumbs_create(struct intel_engine_cs
> *irq_engine)
> init_llist_head(&b->signaled_requests);
>
> spin_lock_init(&b->irq_lock);
> + spin_lock_init(&b->signaler_active_sync);
> init_irq_work(&b->irq_work, signal_irq_work);
>
> b->irq_engine = irq_engine;
> @@ -487,8 +490,11 @@ void intel_context_remove_breadcrumbs(struct
> intel_context *ce,
> if (release)
> intel_context_put(ce);
>
> - while (atomic_read(&b->signaler_active))
> + while (atomic_read(&b->signaler_active)) {
> + spin_lock(&b->signaler_active_sync);
> + spin_unlock(&b->signaler_active_sync);
> cpu_relax();
> + }
> }
>
> static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p)
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index bdf09fd67b6e7..28dae32628aab 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -40,6 +40,7 @@ struct intel_breadcrumbs {
> struct list_head signalers;
> struct llist_head signaled_requests;
> atomic_t signaler_active;
> + spinlock_t signaler_active_sync;
>
> spinlock_t irq_lock; /* protects the interrupt from hardirq context */
> struct irq_work irq_work; /* for use from inside irq_lock */
>
Thinking some more, replacing signaler_active with signaler_active_sync might
be the best fix.
I'm not sure there's much use for parallel completion of the same breadcrumb,
and using completion
might be too heavy handed.
Kind regards,
~Maarten Lankhorst