On 06/03/2025 10:37, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2025-03-04 16:43:45)

On 04/03/2025 13:09, Mikolaj Wasiak wrote:
This test exposes bug in tigerlake hardware which prevents it from
succeeding. Since the tested feature is only available on bugged hardware
and we won't support any new hardware, this test is obsolete and
should be removed.

I randomly clicked on one TGL, one DG2, one MTL and one RKL in the CI
and only saw test passes. Then I looked at the patch below to see if
there is a skip condition but don't see one. So I end up confused since
commit message is making it sound like this only exists on Tigerlake and
it's failing all the time. Is it perhaps a sporadic failure? On all
platforms or just TGL? What am I missing?

The HW issue affects all gen12 platforms currently supported by i915. I
don't have any data for derivatives, so I cannot confirm if this bug was
fixed. The lrc_timestamp test was written to demonstrate this HW bug, to
isolate it from (and explain) the pphwsp runtime discrepancies, covered
by another selftest. The question is whether we want to keep a selftest
that is expected to sporadically fail, that exists purely to hunt for
those failures.

In the past, we have kept such selftests, but hidden them behind
!IS_ENABLED(CONFIG_DRM_I915_SELFTEST_BROKEN).

So,
- keep the selftest and expect sporadic failures in BAT, or

Up to Intel - it's not the first sporadically failing test and in the past at least those were handled.

- remove the selftest and completely forget about the HW issue, or
- hide the selftest and stop it running on known bad platforms?

Either of these two are also fine I think, as long as, if the removal is chosen, it is made sure that either we already have the comment briefly explaining the above somewhere in code, at a suitable location, or that a brief comment is added with the removal. And commit message improved to be less misleading about the failure frequency.

Regards,

Tvrtko

Reply via email to