On Wed, 2026-01-28 at 12:39 +0100, Marco Pagani wrote: > > On 22/01/2026 10:51, Tvrtko Ursulin wrote: > > > > On 20/01/2026 20:52, Marco Pagani wrote: > > > Add a new test suite to simulate concurrent job submissions from > > > userspace. > > > The suite includes a basic test case where each worker submits a single > > > job, and a more advanced case involving the submission of multiple jobs. > > > > New test coverage is welcome! > > Hi Tvrtko, Philip, and thank you. > > > But as Philipp has said some more context would be beneficial. Like are > > you trying to hit a bug, or extend later with something which will hit a > > bug and then you will propose improvements? Or simply improving the > > coverage? > > Sure, I'll extend the commit message to be more descriptive in the next > version. > > > If it is about some race I would maybe consider putting this into a new > > tests_races.c. I actually have this file locally and some unfinished > > test cases already, although it is unclear when I will be happy with > > them to post. But if the test is simply about adding coverage it is fine > > to live in tests_basic.c. > > The general idea is to extend the suite with some initial tests that stress > concurrency to spot race conditions. Having these initial tests grouped > together > with future ones in a new tests_races.c file makes perfect sense to me.
I am not so convinced of a separate file. All the drm_sched tests are there to ensure the soundness and robustness of the scheduler. How is a race special? What would tests_basic.c be for – checking for bugs that are not races? And races.c would be full of tests that are threaded? (questions more directed at Tvrtko) >From my POV having this little test in tests_basic.c is perfectly fine. Having multiple entities per scheduler, associated with multiple task, *is* basic functionality of the scheduler. > > > > Signed-off-by: Marco Pagani <[email protected]> > > > --- > > > drivers/gpu/drm/scheduler/tests/tests_basic.c | 175 ++++++++++++++++++ > > > 1 file changed, 175 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c > > > b/drivers/gpu/drm/scheduler/tests/tests_basic.c > > > index 82a41a456b0a..7c25bcbbe7c9 100644 > > > --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c > > > +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c > > > @@ -2,6 +2,7 @@ > > > /* Copyright (c) 2025 Valve Corporation */ > > > > > > #include <linux/delay.h> > > > +#include <linux/completion.h> > > > > > > #include "sched_tests.h" > > > > > > @@ -235,6 +236,179 @@ static void drm_sched_basic_cancel(struct kunit > > > *test) > > > KUNIT_ASSERT_EQ(test, job->hw_fence.error, -ECANCELED); > > > } > > > > > > +struct sched_concurrent_test_context { > > > + struct drm_mock_scheduler *sched; > > > + struct workqueue_struct *sub_wq; > > > + struct completion wait_go; > > > +}; > > > + > > > +KUNIT_DEFINE_ACTION_WRAPPER(destroy_workqueue_wrap, destroy_workqueue, > > > + struct workqueue_struct *); > > > + > > > +KUNIT_DEFINE_ACTION_WRAPPER(drm_mock_sched_fini_wrap, > > > drm_mock_sched_fini, > > > + struct drm_mock_scheduler *); > > > + > > > +KUNIT_DEFINE_ACTION_WRAPPER(drm_mock_sched_entity_free_wrap, > > > drm_mock_sched_entity_free, > > > + struct drm_mock_sched_entity *); > > > + > > > +static int drm_sched_concurrent_init(struct kunit *test) > > > +{ > > > + struct sched_concurrent_test_context *ctx; > > > + int ret; > > > + > > > + ctx = kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL); > > > + > > > + init_completion(&ctx->wait_go); > > > + > > > + ctx->sched = drm_mock_sched_new(test, MAX_SCHEDULE_TIMEOUT); > > > + > > > + ret = kunit_add_action_or_reset(test, drm_mock_sched_fini_wrap, > > > ctx->sched); > > > + KUNIT_ASSERT_EQ(test, ret, 0); > > > + > > > + /* Use an unbounded workqueue to maximize job submission concurrency */ > > > + ctx->sub_wq = alloc_workqueue("drm-sched-submitters-wq", WQ_UNBOUND, > > > + WQ_UNBOUND_MAX_ACTIVE); > > > + KUNIT_ASSERT_NOT_NULL(test, ctx->sub_wq); > > > + > > > + ret = kunit_add_action_or_reset(test, destroy_workqueue_wrap, > > > ctx->sub_wq); > > > + KUNIT_ASSERT_EQ(test, ret, 0); > > > + > > > + test->priv = ctx; > > > + > > > + return 0; > > > +} > > > + > > > +struct drm_sched_concurrent_params { > > > + const char *description; > > > + unsigned int job_base_us; > > > + unsigned int num_jobs; > > > + unsigned int num_subs; > > > +}; > > > + > > > +static const struct drm_sched_concurrent_params > > > drm_sched_concurrent_cases[] = { > > > + { > > > + .description = "Concurrently submit a single job in a single > > > entity", > > > + .job_base_us = 1000, > > > + .num_jobs = 1, > > > + .num_subs = 32, > > > + }, > > > > Why is submission from a single thread interesting if it is already covered? > > These two initial parameter sets cover only concurrent submission: > multiple submitters, single job / multiple submitters, multiple jobs. > > > > + { > > > + .description = "Concurrently submit multiple jobs in a single > > > entity", > > > + .job_base_us = 1000, > > > + .num_jobs = 10, > > > + .num_subs = 64, > > > + }, > > > +}; > > > + > > > +static void > > > +drm_sched_concurrent_desc(const struct drm_sched_concurrent_params > > > *params, char *desc) > > > +{ > > > + strscpy(desc, params->description, KUNIT_PARAM_DESC_SIZE); > > > +} > > > + > > > +KUNIT_ARRAY_PARAM(drm_sched_concurrent, drm_sched_concurrent_cases, > > > drm_sched_concurrent_desc); > > > + > > > +struct submitter_data { > > > + struct work_struct work; > > > + struct sched_concurrent_test_context *ctx; > > > + struct drm_mock_sched_entity *entity; > > > + struct drm_mock_sched_job **jobs; > > > + struct kunit *test; > > > + unsigned int id; > > > + bool timedout; > > > +}; > > > + > > > +static void drm_sched_submitter_worker(struct work_struct *work) > > > +{ > > > + const struct drm_sched_concurrent_params *params; > > > + struct sched_concurrent_test_context *ctx; > > > + struct submitter_data *sub_data; > > > + unsigned int i, duration_us; > > > + unsigned long timeout_jiffies; > > > + bool done; > > > + > > > + sub_data = container_of(work, struct submitter_data, work); > > > + ctx = sub_data->ctx; > > > + params = sub_data->test->param_value; > > > + > > > + wait_for_completion(&ctx->wait_go); > > > + > > > + for (i = 0; i < params->num_jobs; i++) { > > > + duration_us = params->job_base_us + (sub_data->id * 10); > > > > Why is job duration dependent by the submitter id? > > Just a simple way to have a deterministic distribution of durations. > I can change it if it doesn't fit. > > > Would it be feasiable to not use auto-completing jobs and instead > > advance the timeline manually? Given how the premise of the test seems > > to be about concurrent submission it sounds plausible that what happens > > after submission maybe isn't very relevant. > > Good idea! I'll run some experiments and see if it works. > > > > + drm_mock_sched_job_set_duration_us(sub_data->jobs[i], > > > duration_us); > > > + drm_mock_sched_job_submit(sub_data->jobs[i]); > > > > On a related note, one interesting thing to add coverage for later is > > multi-threaded submit of multiple jobs against a single entity. But it > > is not an immediate need. Just mentioning it as something interesting. > > Currently, the test configures each submitter to submit multiple jobs > against its own dedicated entity. I considered adding a test case for > submitting multiple jobs against multiple entities, but I decided to > leave it for the future. > > > It would mean open coding drm_mock_sched_job_submit() as > > drm_sched_job_arm() and drm_sched_entity_push_job() and sticking some > > delay in between so two threads have the chance to interleave. Mock > > scheduler does not handle it today, neither does the scheduler itself > > who punts responsibility to callers. So adding a test and making the > > mock scheduler handle that properly would serve as an example on how > > scheduler must be used. Or what can go bad if it isn't. > > Do you mean having multiple (k)threads submitting against the same entity? > Would that be used to model a multithread application that uses multiple > queues? Submitters to entities must always ensure their own synchronization before pushing to it. And afterwards it's the scheduler's parallelism that counts. I don't see how multiple threads on an entity are worthwhile testing. It's beyond our scope. Thx. P. > > > > + } > > > + > > > + timeout_jiffies = usecs_to_jiffies(params->job_base_us * > > > params->num_subs * > > > + params->num_jobs * 10); > > > > The timeout calculation could use a comment. You are using num_subs * 10 > > to match the duratiot_us above being id * 10? With logic of calculating > > a pessimistic timeout? > > > > Have you tried it with qemu to check if it is pessimistic enough? > > I'll double check on that. > > > > + for (i = 0; i < params->num_jobs; i++) { > > > + done = drm_mock_sched_job_wait_finished(sub_data->jobs[i], > > > + timeout_jiffies); > > > + if (!done) > > > + sub_data->timedout = true; > > > + } > > > > Technically you only need to wait on the last job but it is passable > > like this too. > > > > Also, is it important for the worker to wait for completion or the main > > thread could simply wait for everything? Maybe that would simplify things. > > I would say they serve different purposes. The completion is used to pause > all worker threads until they are all created to ensure they start submitting > jobs together to maximize concurrency. > > > Manual timeline advance and this combined would mean the workers only > > submit jobs, while the main thread simply does > > drm_mock_sched_advance(sched, num_subs * num_jobs) and waits for last > > job from each submitter to finish. > > > > Again, auto-completion and timeout reporting is something I do not > > immediate see is relevant for multi-threaded submission testing. > > > > Maybe if you want to test the mock scheduler itself it could be, but > > then I would add it as separate entry in drm_sched_concurrent_cases[]. > > Like maybe have a flag/boolean "auto-complete jobs". So one without and > > one with that set. > > I think it's a good idea and I'll experiment to see if it works. > > > Other than that it looks tidy and was easy to follow. Only thing which > > slightly got me was the term "subs" since I don't intuitively associate > > it with a submitter but, well, a sub entity of some kind. Might be worth > > renaming that to submitter(s), or even dropping the prefix in some cases > > might be feasible (like sub(s)_data). > > Agreed. I'll rename "subs" for better clarity. > > > Regards, > > > > Tvrtko > > > > Cheers, > Marco
