sched: Always free the job after the timeout

Tvrtko Ursulin Tue, 06 May 2025 07:18:52 -0700


On 06/05/2025 14:38, Maíra Canal wrote:

Hi Tvrtko,

On 06/05/25 10:28, Tvrtko Ursulin wrote:
On 06/05/2025 13:46, Maíra Canal wrote:
Hi Tvrtko,

Thanks for your review!

On 06/05/25 08:49, Tvrtko Ursulin wrote:
On 03/05/2025 21:59, Maíra Canal wrote:
Currently, if we add the assertions presented in this commit to themock
scheduler, we will see the following output:
[15:47:08] ============== [PASSED] drm_sched_basic_tests==============[15:47:08] ======== drm_sched_basic_timeout_tests (1 subtest)=========[15:47:08] # drm_sched_basic_timeout: ASSERTION FAILED at drivers/gpu/ drm/scheduler/tests/tests_basic.c:246[15:47:08] Expected list_empty(&sched->job_list) to be true, but isfalse
[15:47:08] [FAILED] drm_sched_basic_timeout
[15:47:08] # module: drm_sched_tests
This occurs because `mock_sched_timedout_job()` doesn't properlyhandle
the hang. From the DRM sched documentation, `drm_sched_stop()` and
`drm_sched_start()` are typically used for reset recovery. If these
functions are not used, the offending job won't be freed and should be
freed by the caller.
Currently, the mock scheduler doesn't use the functions provided bytheAPI, nor does it handle the freeing of the job. As a result, thejob isn't
removed from the job list.
For the record the job does gets freed via the kunit managedallocation.
Sorry, I didn't express myself correctly. Indeed, it is. I meant that
the DRM scheduler didn't free the job.
It was a design choice for this test to be a *strict* unit testwhich tests only a _single_ thing. And that is that thetimedout_job() hook gets called. As such the hook was implemented tosatisfy that single requirement only.
What do you think about checking that `sched->job_list` won't be empty?

I wanted to add such assertion to make sure that the behavior of the
timeout won't change in future (e.g. a patch makes a change that calls
`free_job()` for the guilty job at timeout). Does it make sense to you?
Where would that assert be?
I believe it would be in the same place as this patch assertions, but
instead of `KUNIT_ASSERT_TRUE(test, list_empty(&sched->job_list));`, it
would be `KUNIT_ASSERT_FALSE(test, list_empty(&sched->job_list));`.

But I don't feel strongly about it. I can drop the patch if you believe
it's a better option.

I don't mind this patch, I was just explaining how the current test wasdeliberately testing a single thing. But we can change it to test more,that's fine.

In this case it would go from testing that the timeout callback fires totesting the callback fires and something else happens.


Key is to define the "something else" part so it is not timing sensitive.

The drm_sched_stop+delay+drm_sched_start approach would perhaps work ifyou would signal the job with an errno set before drm_sched_stop?

Then it would hit the "free_guilty" path in drm_sched_stop and wouldn'tbe timing sensitive. As long as there aren't any locking considerationsI am missing. That could then safely have the two list_empty asserts.


Regards,

Tvrtko

Re: [PATCH 2/8] drm/sched: Always free the job after the timeout

Reply via email to