On Tue, May 07, 2024 at 10:09:18AM +0100, Tvrtko Ursulin wrote:
> 
> On 07/05/2024 00:23, Matthew Brost wrote:
> > On Thu, May 02, 2024 at 03:33:50PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > Hi all,
> > > 
> > > Continuing after the brief IRC discussion yesterday regarding work queues
> > > being prone to deadlocks or not, I had a browse around the code base and
> > > ended up a bit confused.
> > > 
> > > When drm_sched_init documents and allocates an *ordered* wq, if no custom
> > > one was provided, could someone remind me was the ordered property
> > > fundamental for something to work correctly? Like run_job vs free_job
> > > ordering?
> > > 
> > 
> > Before the work queue (kthread design), run_job & free_job were ordered.
> > It was decided to not break this existing behavior.
> 
> Simply for extra paranoia or you remember if there was a reason identified?
> 

Not to break existing behavior. Can dig the entire thread if for
reference if needed.

> > > I ask because it appears different drivers to different things and at the
> > > moment it looks we have all possible combos or ordered/unordered, bound 
> > > and
> > > unbound, shared or not shared with the timeout wq, or even unbound for the
> > > timeout wq.
> > > 
> > > The drivers worth looking at in this respect are probably nouveau, 
> > > panthor,
> > > pvr and xe.
> > > 
> > > Nouveau also talks about a depency betwen run_job and free_job and goes to
> > > create two unordered wqs.
> > > 
> > > Then xe looks a bit funky with the workaround/hack for lockep where it
> > > creates 512 work queues and hands them over to user queues in round-robin
> > > fashion. (Instead of default 1:1.) Which I suspect is a problem which 
> > > should
> > > be applicable for any 1:1 driver given a thorough enough test suite.
> > > 
> > 
> > I think lockdep ran out of chains or something when executing some wild
> > IGT with 1:1. Yes, any driver with a wild enough test would likely hit
> > this lockdep splat too. Using a pool probably is not bad idea either.
> 
> I wonder what is different between that and having a single shared unbound
> queue and let kernel manage the concurrency? Both this..
> 

Each action (run_job, free_job, and Xe specific process msg) has its own
work item on the DRM scheduler work queue. In Xe, these options must be
ordered, or strictly speaking, not executed in parallel within the DRM
sched entity/scheduler. With a single shared unbound queue, this breaks.

> > > So anyway.. ordered vs unordered - drm sched dictated or at driver's 
> > > choice?
> > > 
> > 
> > Default ordered, driver can override with unordered.
> 
> .. and this, go back to my original question - whether the default queue
> must be ordered or not, or under which circustmances can drivers choose
> unordered. I think in drm_sched_init, where kerneldoc says it will create an
> ordered queue, it would be good to document the rules.
>

Sure. Let me write something up.

Matt

> Regards,
> 
> Tvrtko

Reply via email to