On Thu, Jan 10, 2013 at 02:23:37AM +0000, Chris Wilson wrote:
> Once we thought we got semaphores working, we disabled kicking the ring
> if hangcheck fired whilst waiting upon a ring as it was doing more harm
> than good:
> 
> commit 4e0e90dcb8a7df1229c69e30abebb59b0b3c2a1f
> Author: Daniel Vetter <[email protected]>
> Date:   Wed Dec 14 13:56:58 2011 +0100
> 
>     drm/i915: kicking rings stuck on semaphores considered harmful
> 
> However, life is never that easy and semaphores are still causing
> problems whereby the value written by one ring (bcs) is not being
> propagated to the waiter (rcs). Thus the waiter never wakes up and we
> declare the GPU hung, which often has unfortunate consequences, even if
> we successfully reset the GPU.
> 
> But the GPU is idle as it has completed the work, just didn't notify its
> clients. So we can detect the incomplete wait during hang check and
> probe the target ring to see if has indeed emitted the breadcrumb seqno
> following the work and then and only then kick the waiter.
> 
> Based on a suggestion by Ben Widawsky.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=54226
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Ben Widawsky <[email protected]>
> ---
>  drivers/gpu/drm/i915/i915_irq.c |   34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 39e4177..3cc52b2 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1693,6 +1693,32 @@ static bool i915_hangcheck_ring_idle(struct 
> intel_ring_buffer *ring, bool *err)
>       return false;
>  }
>  
> +static bool semaphore_passed(struct intel_ring_buffer *waiter)
> +{
> +     struct drm_i915_private *dev_priv = waiter->dev->dev_private;
> +     int acthd = intel_ring_get_active_head(waiter) & HEAD_ADDR;
> +     struct intel_ring_buffer *signaller;
> +     u32 cmd;
> +
> +     /* ACTHD is likely pointing to the dword after the actual command,
> +      * so scan backwards until we find the MBOX.
> +      */
> +#define WAIT (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | 
> MI_SEMAPHORE_REGISTER)
> +     do {
> +             cmd = ioread32(waiter->virtual_start+acthd);
> +             if ((cmd & ~(0x3 << 16)) == WAIT)
> +                     break;
> +             acthd -= 4;
> +             if (acthd < 0)
> +                     return false;
> +     } while (1);
> +#undef WAIT
> +
> +     signaller = &dev_priv->ring[(waiter->id + (((cmd >> 16) & 3) - 1)) % 3];

I hate you for this line ^^

> +     return i915_seqno_passed(signaller->get_seqno(signaller, false),
> +                              ioread32(waiter->virtual_start+acthd+4)+1);
> +}
> +

For the sake of just testing if this makes the problem go away, this
could have been a lot simpler. Just check the RCS mbox (since in our
discussion, you said all the bugs had been RCS waiting).

i915_seqno_passed(bcs->get_seqno, GEN6_RBSYNC)

>  static bool kick_ring(struct intel_ring_buffer *ring)
>  {
>       struct drm_device *dev = ring->dev;
> @@ -1704,6 +1730,14 @@ static bool kick_ring(struct intel_ring_buffer *ring)
>               I915_WRITE_CTL(ring, tmp);
>               return true;
>       }
> +     if (INTEL_INFO(dev)->gen >= 6 &&
> +         tmp & RING_WAIT_SEMAPHORE &&
> +         semaphore_passed(ring)) {
> +             DRM_ERROR("Kicking stuck semaphore on %s\n",
> +                       ring->name);
> +             I915_WRITE_CTL(ring, tmp);
> +             return true;
> +     }
>       return false;
>  }

I'd go with just gen == 6, for now. Also, since we know the waiter from
your horrifically complex calculation above, maybe we can get that info in
the DRM_ERROR as well?

On the idea:
Acked-by: Ben Widawsky <[email protected]>

I'll happily take time to turn it into an r-b if it fixes a bug.

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to