On 18/03/2026 11:51, Tvrtko Ursulin wrote:
> 
> On 17/03/2026 17:04, Marco Pagani wrote:
>> On 23/02/2026 17:25, Tvrtko Ursulin wrote:
>>>
>>> On 19/02/2026 14:07, Marco Pagani wrote:
>>>> Add a new test suite to simulate concurrent job submissions to the DRM
>>>> scheduler. The suite includes two initial test cases: (i) a primary test
>>>> case for parallel job submission and (ii) a secondary test case for
>>>> interleaved job submission and completion. In the first test case, worker
>>>> threads concurrently submit jobs to the scheduler and then the timeline is
>>>> manually advanced. In the second test case, worker threads periodically
>>>> submit a sequence of jobs to the mock scheduler. Periods are chosen as
>>>> harmonic, starting from a base period, to allow interleaving between
>>>> submission and completion. To avoid impractically large execution times,
>>>> periods are cycled. The timeline is advanced automatically when the jobs
>>>> completes. This models how job submission from userspace (in process
>>>> context) may interleave with job completion (hrtimer callback interrupt
>>>> context, in the test case) by a physical device.
>>>>
>>>> Signed-off-by: Marco Pagani <[email protected]>
>>>> ---
>>>> Changes in v1:
>>>> - Split the original suite into two test cases (Tvrtko Ursulin).
>>>> - General test refactoring and variable renaming for clarity.
>>>> - Changed parameters to have a fairer workload distribution.
>>>> - Improved cleanup logic.
>>>> - Added kunit_info() prints for tracing workers.
>>>> ---
>>>>    drivers/gpu/drm/scheduler/tests/tests_basic.c | 338 ++++++++++++++++++
>>>>    1 file changed, 338 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c 
>>>> b/drivers/gpu/drm/scheduler/tests/tests_basic.c
>>>> index 82a41a456b0a..391edcbaf43a 100644
>>>> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
>>>> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
>>>> @@ -2,6 +2,8 @@
>>>>    /* Copyright (c) 2025 Valve Corporation */
>>>>    
>>>>    #include <linux/delay.h>
>>>> +#include <linux/completion.h>
>>>> +#include <linux/workqueue.h>
>>>>    
>>>>    #include "sched_tests.h"
>>>>    
>>>> @@ -235,6 +237,341 @@ static void drm_sched_basic_cancel(struct kunit 
>>>> *test)
>>>>            KUNIT_ASSERT_EQ(test, job->hw_fence.error, -ECANCELED);
>>>>    }
>>>>    
>>>> +struct sched_concurrent_context {
>>>> +  struct drm_mock_scheduler *sched;
>>>> +  struct workqueue_struct *sub_wq;
>>>> +  struct kunit *test;
>>>> +  struct completion wait_go;
>>>> +};
>>>> +
>>>> +KUNIT_DEFINE_ACTION_WRAPPER(drm_mock_sched_fini_wrap, drm_mock_sched_fini,
>>>> +                      struct drm_mock_scheduler *);
>>>> +
>>>> +KUNIT_DEFINE_ACTION_WRAPPER(drm_mock_sched_entity_free_wrap, 
>>>> drm_mock_sched_entity_free,
>>>> +                      struct drm_mock_sched_entity *);
>>>> +
>>>> +static void complete_destroy_workqueue(void *context)
>>>> +{
>>>> +  struct sched_concurrent_context *ctx = (struct sched_concurrent_context 
>>>> *)context;
>>>> +
>>>> +  complete_all(&ctx->wait_go);
>>>> +
>>>> +  destroy_workqueue(ctx->sub_wq);
>>>> +}
>>>> +
>>>> +static int drm_sched_concurrent_init(struct kunit *test)
>>>> +{
>>>> +  struct sched_concurrent_context *ctx;
>>>> +  int ret;
>>>> +
>>>> +  ctx = kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL);
>>>> +  KUNIT_ASSERT_NOT_NULL(test, ctx);
>>>> +
>>>> +  init_completion(&ctx->wait_go);
>>>> +
>>>> +  ctx->sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT);
>>>> +
>>>> +  ret = kunit_add_action_or_reset(test, drm_mock_sched_fini_wrap, 
>>>> ctx->sched);
>>>> +  KUNIT_ASSERT_EQ(test, ret, 0);
>>>> +
>>>> +  /* Use an unbounded workqueue to maximize job submission concurrency */
>>>> +  ctx->sub_wq = alloc_workqueue("drm-sched-submitters-wq", WQ_UNBOUND,
>>>> +                                WQ_UNBOUND_MAX_ACTIVE);
>>>> +  KUNIT_ASSERT_NOT_NULL(test, ctx->sub_wq);
>>>> +
>>>> +  ret = kunit_add_action_or_reset(test, complete_destroy_workqueue, ctx);
>>>> +  KUNIT_ASSERT_EQ(test, ret, 0);
>>>> +
>>>> +  ctx->test = test;
>>>> +  test->priv = ctx;
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +struct drm_sched_parallel_params {
>>>> +  const char *description;
>>>> +  unsigned int num_jobs;
>>>> +  unsigned int num_workers;
>>>> +};
>>>> +
>>>> +static const struct drm_sched_parallel_params drm_sched_parallel_cases[] 
>>>> = {
>>>> +  {
>>>> +          .description = "Workers submitting multiple jobs against a 
>>>> single entity",
>>>
>>> Each worker has own entity so the description is a bit confusing.
>>
>> Right, I'll change it "Multiple workers submitting multiple jobs from their 
>> entity"
>> as you suggested in the following email.
>>
>>>> +          .num_jobs = 4,
>>>> +          .num_workers = 16,
>>>> +  },
>>>> +};
>>>> +
>>>> +static void
>>>> +drm_sched_parallel_desc(const struct drm_sched_parallel_params *params, 
>>>> char *desc)
>>>> +{
>>>> +  strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
>>>> +}
>>>> +
>>>> +KUNIT_ARRAY_PARAM(drm_sched_parallel, drm_sched_parallel_cases, 
>>>> drm_sched_parallel_desc);
>>>> +
>>>> +struct parallel_worker {
>>>> +  struct work_struct work;
>>>> +  struct sched_concurrent_context *ctx;
>>>> +  struct drm_mock_sched_entity *entity;
>>>> +  struct drm_mock_sched_job **jobs;
>>>> +  unsigned int id;
>>>> +};
>>>> +
>>>> +static void drm_sched_parallel_worker(struct work_struct *work)
>>>> +{
>>>> +  const struct drm_sched_parallel_params *params;
>>>> +  struct sched_concurrent_context *test_ctx;
>>>> +  struct parallel_worker *worker;
>>>> +  unsigned int i;
>>>> +
>>>> +  worker = container_of(work, struct parallel_worker, work);
>>>> +  test_ctx = worker->ctx;
>>>> +  params = test_ctx->test->param_value;
>>>> +
>>>> +  wait_for_completion(&test_ctx->wait_go);
>>>> +
>>>> +  kunit_info(test_ctx->test, "Parallel worker %u started\n", worker->id);
>>>> +
>>>> +  for (i = 0; i < params->num_jobs; i++)
>>>> +          drm_mock_sched_job_submit(worker->jobs[i]);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Spawns workers that submit a sequence of jobs to the mock scheduler.
>>>> + * Once all jobs are submitted, the timeline is manually advanced.
>>>> + */
>>>> +static void drm_sched_parallel_submit_test(struct kunit *test)
>>>> +{
>>>> +  struct sched_concurrent_context *ctx = test->priv;
>>>> +  const struct drm_sched_parallel_params *params = test->param_value;
>>>> +  struct parallel_worker *workers;
>>>> +  unsigned int i, j, completed_jobs;
>>>> +  bool done;
>>>> +  int ret;
>>>> +
>>>> +  workers = kunit_kcalloc(test, params->num_workers, sizeof(*workers),
>>>> +                          GFP_KERNEL);
>>>> +  KUNIT_ASSERT_NOT_NULL(test, workers);
>>>> +
>>>> +  /*
>>>> +   * Init workers only after all jobs and entities have been successfully
>>>> +   * allocated. In this way, the cleanup logic for when an assertion fail
>>>> +   * can be simplified.
>>>> +   */
>>>> +  for (i = 0; i < params->num_workers; i++) {
>>>> +          workers[i].id = i;
>>>> +          workers[i].ctx = ctx;
>>>> +          workers[i].entity = drm_mock_sched_entity_new(test,
>>>> +                                                        
>>>> DRM_SCHED_PRIORITY_NORMAL,
>>>> +                                                        ctx->sched);
>>>> +
>>>> +          ret = kunit_add_action_or_reset(test, 
>>>> drm_mock_sched_entity_free_wrap,
>>>> +                                          workers[i].entity);
>>>> +          KUNIT_ASSERT_EQ(test, ret, 0);
>>>> +
>>>> +          workers[i].jobs = kunit_kcalloc(test, params->num_jobs,
>>>> +                                          sizeof(*workers[i].jobs), 
>>>> GFP_KERNEL);
>>>> +          KUNIT_ASSERT_NOT_NULL(test, workers[i].jobs);
>>>> +
>>>> +          for (j = 0; j < params->num_jobs; j++)
>>>> +                  workers[i].jobs[j] = drm_mock_sched_job_new(test,
>>>> +                                                              
>>>> workers[i].entity);
>>>> +  }
>>>> +
>>>> +  for (i = 0; i < params->num_workers; i++) {
>>>> +          INIT_WORK(&workers[i].work, drm_sched_parallel_worker);
>>>> +          queue_work(ctx->sub_wq, &workers[i].work);
>>>> +  }
>>>> +
>>>> +  complete_all(&ctx->wait_go);
>>>> +  flush_workqueue(ctx->sub_wq);
>>>> +
>>>> +  for (i = 0; i < params->num_workers; i++) {
>>>> +          for (j = 0; j < params->num_jobs; j++) {
>>>> +                  done = 
>>>> drm_mock_sched_job_wait_scheduled(workers[i].jobs[j],
>>>> +                                                           HZ);
>>>> +                  KUNIT_ASSERT_TRUE(test, done);
>>>> +
>>>> +                  done = 
>>>> drm_mock_sched_job_is_finished(workers[i].jobs[j]);
>>>> +                  KUNIT_ASSERT_FALSE(test, done);
>>>
>>> This second assert does not seem to be bringing much value, but instead
>>> makes the reader ask themselves why it is there. Remove it?
>>>
>>> Hmm in fact this whole loop could be removed. All that it is needed
>>> below is to wait for the last job to be completed.
>>
>> I'll reply to this in the following mail since Philip joined the 
>> conversation.
>>   
>>>> +          }
>>>> +  }
>>>> +
>>>> +  completed_jobs = drm_mock_sched_advance(ctx->sched,
>>>> +                                          params->num_workers * 
>>>> params->num_jobs);
>>>> +  KUNIT_ASSERT_EQ(test, completed_jobs, params->num_workers * 
>>>> params->num_jobs);
>>>> +
>>>> +  for (i = 0; i < params->num_workers; i++) {
>>>> +          for (j = 0; j < params->num_jobs; j++) {
>>>> +                  done = 
>>>> drm_mock_sched_job_is_finished(workers[i].jobs[j]);
>>>> +                  KUNIT_ASSERT_TRUE(test, done);
>>>> +          }
>>>> +  }
>>>
>>> So here, after advancing you just need to wait on the last job to complete.
>>>
>>> Not a deal breaker to have it verbose but usually the less is better and
>>> easier to spot the key thing being tested and what are the functional steps.
>>>
>>>> +}
>>>> +
>>>> +struct drm_sched_interleaved_params {
>>>> +  const char *description;
>>>> +  unsigned int job_base_period_us;
>>>> +  unsigned int periods_cycle;
>>>> +  unsigned int num_jobs;
>>>> +  unsigned int num_workers;
>>>> +};
>>>> +
>>>> +static const struct drm_sched_interleaved_params 
>>>> drm_sched_interleaved_cases[] = {
>>>> +  {
>>>> +          .description = "Workers submitting single jobs against a single 
>>>> entity",
>>>
>>> As with the parallel description.
>>
>> Agreed.
>>   
>>>> +          .job_base_period_us = 100,
>>>> +          .periods_cycle = 10,
>>>> +          .num_jobs = 1,
>>>> +          .num_workers = 32,
>>>> +  },
>>>> +  {
>>>> +          .description = "Workers submitting multiple jobs against a 
>>>> single entity",
>>>> +          .job_base_period_us = 100,
>>>> +          .periods_cycle = 10,
>>>> +          .num_jobs = 4,
>>>> +          .num_workers = 16,
>>>> +  },
>>>> +};
>>>> +
>>>> +static void
>>>> +drm_sched_interleaved_desc(const struct drm_sched_interleaved_params 
>>>> *params, char *desc)
>>>> +{
>>>> +  strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE);
>>>> +}
>>>> +
>>>> +KUNIT_ARRAY_PARAM(drm_sched_interleaved, drm_sched_interleaved_cases,
>>>> +            drm_sched_interleaved_desc);
>>>> +
>>>> +struct interleaved_worker {
>>>> +  struct work_struct work;
>>>> +  struct sched_concurrent_context *ctx;
>>>> +  struct drm_mock_sched_entity *entity;
>>>> +  struct drm_mock_sched_job **jobs;
>>>> +  unsigned int id;
>>>> +  unsigned int period_us;
>>>> +  unsigned int timeout_us;
>>>> +};
>>>> +
>>>> +static void drm_sched_interleaved_worker(struct work_struct *work)
>>>> +{
>>>> +  const struct drm_sched_interleaved_params *params;
>>>> +  struct sched_concurrent_context *test_ctx;
>>>> +  struct interleaved_worker *worker;
>>>> +  unsigned int i;
>>>> +  bool done;
>>>> +
>>>> +  worker = container_of(work, struct interleaved_worker, work);
>>>> +  test_ctx = worker->ctx;
>>>> +  params = test_ctx->test->param_value;
>>>> +
>>>> +  wait_for_completion(&test_ctx->wait_go);
>>>> +
>>>> +  kunit_info(test_ctx->test, "Interleaved worker %u with period %u us 
>>>> started\n",
>>>> +             worker->id, worker->period_us);
>>>> +
>>>> +  /* Release jobs as a periodic sequence */
>>>> +  for (i = 0; i < params->num_jobs; i++) {
>>>> +          drm_mock_sched_job_set_duration_us(worker->jobs[i], 
>>>> worker->period_us);
>>>> +          drm_mock_sched_job_submit(worker->jobs[i]);
>>>> +
>>>> +          done = drm_mock_sched_job_wait_finished(worker->jobs[i],
>>>> +                                                  
>>>> usecs_to_jiffies(worker->timeout_us));
>>>> +          if (!done)
>>>> +                  kunit_info(test_ctx->test, "Job %u of worker %u 
>>>> timedout\n",
>>>> +                             i, worker->id);
>>>> +  }
>>>> +}
>>>> +
>>>> +/*
>>>> + * Spawns workers that submit a periodic sequence of jobs to the mock 
>>>> scheduler.
>>>> + * Uses harmonic periods to allow interleaving and cycles through them to 
>>>> prevent
>>>> + * excessively large execution times. Since the scheduler serializes jobs 
>>>> from all
>>>> + * workers, the timeout is set to the hyperperiod with a 2x safety 
>>>> factor. Entities
>>>> + * and jobs are pre-allocated in the main thread to avoid using KUnit 
>>>> assertions in
>>>> + * the workers.
>>>> + */
>>>> +static void drm_sched_interleaved_submit_test(struct kunit *test)
>>>> +{
>>>> +  struct sched_concurrent_context *ctx = test->priv;
>>>> +  const struct drm_sched_interleaved_params *params = test->param_value;
>>>> +  struct interleaved_worker *workers;
>>>> +  unsigned int period_max_us, timeout_us;
>>>> +  unsigned int i, j;
>>>> +  bool done;
>>>> +  int ret;
>>>> +
>>>> +  workers = kunit_kcalloc(test, params->num_workers, sizeof(*workers),
>>>> +                          GFP_KERNEL);
>>>> +  KUNIT_ASSERT_NOT_NULL(test, workers);
>>>> +
>>>> +  period_max_us = params->job_base_period_us * (1 << 
>>>> params->periods_cycle);
>>>> +  timeout_us = params->num_workers * period_max_us * 2;
>>>> +
>>>> +  /*
>>>> +   * Init workers only after all jobs and entities have been successfully
>>>> +   * allocated. In this way, the cleanup logic for when an assertion fail
>>>> +   * can be simplified.
>>>> +   */
>>>> +  for (i = 0; i < params->num_workers; i++) {
>>>> +          workers[i].id = i;
>>>> +          workers[i].ctx = ctx;
>>>> +          workers[i].timeout_us = timeout_us;
>>>> +
>>>> +          if (i % params->periods_cycle == 0)
>>>> +                  workers[i].period_us = params->job_base_period_us;
>>>> +          else
>>>> +                  workers[i].period_us = workers[i - 1].period_us * 2;
>>>
>>> Are the two if statements effectively equivalent to:
>>>
>>>    period_us = params->job_base_period_us << (i % params->periods_cycle);
>>>
>>> ?
>>
>> Yes. Actually, I was very undecided between using the closed form and the
>> iterative form. In the end, I went with the iterative one because I thought
>> it would be easier to understand and maintain, and this code is not
>> performance-critical. However, I don't have a strong opinion on this.
>> Either way works for me.
> 
> AFAIR, and maybe it is just me, the if-else version with referencing the 
> previous (i - 1) period took me some time to figure out what it is 
> about. My version is obvious to me that it is a repeating cycle (modulo 
> operator) of doubling intervals (<<).

Right, a for loop with (i - 1) can indeed be confusing. I'll take that
into account.

>>
>>> Also, do you have an idea how to locally calculate a suitable
>>> periods_cycle instead of having to come up with a number in the
>>> drm_sched_interleaved_params array. Just strikes me as hard to know what
>>> to put in there if someone would want to add a test.
>>
>> Please see below.
>>   
>>>> +
>>>> +          workers[i].entity = drm_mock_sched_entity_new(test,
>>>> +                                                        
>>>> DRM_SCHED_PRIORITY_NORMAL,
>>>> +                                                        ctx->sched);
>>>> +
>>>> +          ret = kunit_add_action_or_reset(test, 
>>>> drm_mock_sched_entity_free_wrap,
>>>> +                                          workers[i].entity);
>>>> +          KUNIT_ASSERT_EQ(test, ret, 0);
>>>> +
>>>> +          workers[i].jobs = kunit_kcalloc(test, params->num_jobs,
>>>> +                                          sizeof(*workers[i].jobs), 
>>>> GFP_KERNEL);
>>>> +          KUNIT_ASSERT_NOT_NULL(test, workers[i].jobs);
>>>> +
>>>> +          for (j = 0; j < params->num_jobs; j++) {
>>>> +                  workers[i].jobs[j] = drm_mock_sched_job_new(test,
>>>> +                                                              
>>>> workers[i].entity);
>>>> +                  done = 
>>>> drm_mock_sched_job_is_finished(workers[i].jobs[j]);
>>>> +                  KUNIT_ASSERT_FALSE(test, done);
>>>
>>> Same as above, this is asserted by a basic test case so I'd just remove it.
>>>
>>>> +          }
>>>> +  }
>>>> +
>>>> +  for (i = 0; i < params->num_workers; i++) {
>>>> +          INIT_WORK(&workers[i].work, drm_sched_interleaved_worker);
>>>> +          queue_work(ctx->sub_wq, &workers[i].work);
>>>> +  }
>>>> +
>>>> +  complete_all(&ctx->wait_go);
>>>> +  flush_workqueue(ctx->sub_wq);
>>>> +
>>>> +  for (i = 0; i < params->num_workers; i++) {
>>>> +          for (j = 0; j < params->num_jobs; j++) {
>>>> +                  done = 
>>>> drm_mock_sched_job_is_finished(workers[i].jobs[j]);
>>>> +                  KUNIT_ASSERT_TRUE(test, done);
>>>> +          }
>>>> +  }
>>>
>>> Here as well you could wait just for the last job per worker.
>>>
>>> On the overal submission pattern - Don't you end up with a very uneven
>>> activity between the workers? Because the job duration is doubling and
>>> workers are single-shot, once the low index workers are done they are
>>> done, and all that remains are the higher ones, and so the wave of fewer
>>> and fewer active workers continues. Low index worker do not end up
>>> racing with the job completions of the high index workers but only
>>> between themselves, and even that with the cycle=10 workers=16 case is
>>> even more limited.
>>
>> Good point.
>>
>>> Alternative approach could be to set a per test time budget and just
>>> keep the workers submitting until over. It would be simpler to
>>> understand and there would be more submit/complete overlap.
>>
>> I agree. Using a test time budget and having workers continuously
>> submit jobs until it expires would make better use of the test time.
>> I'm thinking that the simplest and most straightforward approach would
>> be to cyclically distribute periods among workers until they reach the
>> the largest possibile value below test duration, which would coincide
>> with the hyperperiod. This would also solve the issue of selecting a
>> suitable periods_cycle parameter that you mentioned earlier.
>> In practice, something like this:
>>
>> drm_sched_interleaved_params [...]
>> {
>>          .num_workers = N
>>          .test_max_time = T
>>          .job_base_period_us = P         /* Small GPU job, 100 us */
>> }
>>
>> period_us = job_base_period_us;
>> for (i = 0; i < params->num_workers; i++) {
>>          workers[i].period_us = period_us;
>>          period_us *= 2;
>>          if (period_us > test_max_time)
>>                  period_us = job_base_period_us;
>> }
>>
>>
>> What do you think?
> 
> 
> Again some time has passed so rather than going to re-read your patch I 
> will go from memory. IIRC I was thinking something really dumb and 100% 
> time bound with no need to think when coding and reviewing. Each thread 
> simply does:
> 
> ktime_t start = ktime_get();
> 
> do {
>   .. thread doing its submission pattern thing ..
> } while (ktime_to_ms(ktime_sub(ktime_get(), start)) < test->time_ms);
> 
> May miss the time target by a job period_us but who cares.

Sorry for the delay. I got pulled into other things. I left out the worker
execution part since we already agreed on that. Instead, I've replied with
some pseudocode describing a new strategy for period assignments from test
parameters that takes into account your comments.

Thanks,
Marco



Reply via email to