Quoting Tvrtko Ursulin (2020-11-04 12:20:42)
> From: Tvrtko Ursulin <[email protected]>
>
> Between events which trigger engine and GPU resets and capturing the error
> state we lose information on which engine triggered the reset. Improve
> this by passing in the hung engine mask down to error capture.
>
> Result is that the list of engines in user visible "GPU HANG: ecode
> <gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
> active engines. Most importantly the displayed process is now the one
> which was actually hung.
You could also suggest to only include the hanging engine in the report,
as is intended to be the normal means of generating the report
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h
> b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 0220b0992808..3a7ca90a3436 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -59,6 +59,7 @@ struct i915_request_coredump {
> struct intel_engine_coredump {
> const struct intel_engine_cs *engine;
>
> + bool hung;
> bool simulated;
> u32 reset_count;
>
> @@ -218,8 +219,10 @@ struct drm_i915_error_state_buf {
> __printf(2, 3)
> void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f,
> ...);
>
> -struct i915_gpu_coredump *i915_gpu_coredump(struct drm_i915_private *i915);
> -void i915_capture_error_state(struct drm_i915_private *i915);
> +struct i915_gpu_coredump *i915_gpu_coredump(struct intel_gt *gt,
> + intel_engine_mask_t engine_mask);
> +void i915_capture_error_state(struct intel_gt *gt,
> + intel_engine_mask_t engine_mask);
Don't forget the stubs.
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx