On 06/05/2025 14:38, Maíra Canal wrote:
Hi Tvrtko,

On 06/05/25 10:28, Tvrtko Ursulin wrote:

On 06/05/2025 13:46, Maíra Canal wrote:
Hi Tvrtko,

Thanks for your review!

On 06/05/25 08:49, Tvrtko Ursulin wrote:

On 03/05/2025 21:59, Maíra Canal wrote:
Currently, if we add the assertions presented in this commit to the mock
scheduler, we will see the following output:

[15:47:08] ============== [PASSED] drm_sched_basic_tests ============== [15:47:08] ======== drm_sched_basic_timeout_tests (1 subtest) ========= [15:47:08] # drm_sched_basic_timeout: ASSERTION FAILED at drivers/ gpu/ drm/scheduler/tests/tests_basic.c:246 [15:47:08] Expected list_empty(&sched->job_list) to be true, but is false
[15:47:08] [FAILED] drm_sched_basic_timeout
[15:47:08] # module: drm_sched_tests

This occurs because `mock_sched_timedout_job()` doesn't properly handle
the hang. From the DRM sched documentation, `drm_sched_stop()` and
`drm_sched_start()` are typically used for reset recovery. If these
functions are not used, the offending job won't be freed and should be
freed by the caller.

Currently, the mock scheduler doesn't use the functions provided by the API, nor does it handle the freeing of the job. As a result, the job isn't
removed from the job list.

For the record the job does gets freed via the kunit managed allocation.

Sorry, I didn't express myself correctly. Indeed, it is. I meant that
the DRM scheduler didn't free the job.


It was a design choice for this test to be a *strict* unit test which tests only a _single_ thing. And that is that the timedout_job() hook gets called. As such the hook was implemented to satisfy that single requirement only.


What do you think about checking that `sched->job_list` won't be empty?

I wanted to add such assertion to make sure that the behavior of the
timeout won't change in future (e.g. a patch makes a change that calls
`free_job()` for the guilty job at timeout). Does it make sense to you?

Where would that assert be?


I believe it would be in the same place as this patch assertions, but
instead of `KUNIT_ASSERT_TRUE(test, list_empty(&sched->job_list));`, it
would be `KUNIT_ASSERT_FALSE(test, list_empty(&sched->job_list));`.

But I don't feel strongly about it. I can drop the patch if you believe
it's a better option.

I don't mind this patch, I was just explaining how the current test was deliberately testing a single thing. But we can change it to test more, that's fine.

In this case it would go from testing that the timeout callback fires to testing the callback fires and something else happens.

Key is to define the "something else" part so it is not timing sensitive.

The drm_sched_stop+delay+drm_sched_start approach would perhaps work if you would signal the job with an errno set before drm_sched_stop?

Then it would hit the "free_guilty" path in drm_sched_stop and wouldn't be timing sensitive. As long as there aren't any locking considerations I am missing. That could then safely have the two list_empty asserts.

Regards,

Tvrtko

Reply via email to