Quoting Michel Thierry (2017-05-22 18:46:24)
> +       /* try engine reset first, and continue if fails */

/* Please use sentences when convenient. It looks much neater that way. */

> +       if (intel_has_reset_engine(dev_priv)) {
> +               struct intel_engine_cs *engine;
> +               unsigned int tmp;
> +
> +               /* protect against concurrent reset attempts */
> +               if (test_and_set_bit(I915_RESET_ENGINE_IN_PROGRESS,
> +                                    &dev_priv->gpu_error.flags))
> +                       goto out;
> +
> +               for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> +                       if (i915_reset_engine(engine) == 0)
> +                               engine_mask &= ~intel_engine_flag(engine);
> +               }
> +
> +               /* clear unconditionally, full reset won't care about it */
> +               clear_bit(I915_RESET_ENGINE_IN_PROGRESS,
> +                         &dev_priv->gpu_error.flags);
> +
> +               if (!engine_mask)
> +                       goto out;
> +       }
> +
> +       /* full reset needs the mutex, stop any other user trying to do so */
>         if (test_and_set_bit(I915_RESET_BACKOFF,
>                              &dev_priv->gpu_error.flags))

We have a big gaping hole here in coordination between a global reset
and per-engine resets.

I think you want to do something like:

if (intel_has_engine_reset()) {
        for_each_engine_mask() {
                if (test_and_set_bit(I915_RESET_ENGINE + engine->id))
                        continue;

                if (i915_reset_engine() == 0)
                        engine_mask &= ~engine->mask;

                clear_bit(I915_RESET_ENGINE + engine->id);
                wake_up_bit(I915_RESET_ENGINE + engine->id);
        }
}

if (!engine_mask)
        goto out;

if (test_and_set_bit(I915_RESET_BACKOFF))
        goto out;

for_each_engine() {
        wait_on_bit(I915_RESET_ENGINE + engine->id);
        set_bit(I915_RESET_ENGINE);
}

...do global reset...

for_each_engine() {
        clear_bit(I915_RESET_ENGINE);
}

The idea is that any per-engine reset that comes in whilst global is in
progress can skip, and that the global waits for any per-engine reset to
finish before clobbering state. The window in which global reset
completes and a new hang is detected before we clear the bits, I am
judiciously ignoring. Also this should allow us to perform parallel
resets between engines. (Now that's a selling point!)
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to