Hi all,

I just submit a v3 patch according your opinion on using kthread_park
instead.

Thanks,
Jingwen
On Fri Aug 20, 2021 at 09:20:42AM +0200, Christian König wrote:
> No, that perfectly works for me.
> 
> The problem we used to have with this approach was that we potentially have
> multiple timeouts at the same time.
> 
> But when we serialize the timeout handling by using a single workqueue as
> suggested by Daniel now as well then that isn't an issue any more.
> 
> Regards,
> Christian.
> 
> Am 20.08.21 um 09:12 schrieb Liu, Monk:
> > [AMD Official Use Only]
> > 
> > @Daniel Vetter @Grodzovsky, Andrey @Koenig, Christian
> > Do you have any concern on the kthread_park() approach ?
> > 
> > Theoretically speaking sched_main shall run there exclusively with 
> > job_timeout since they both touches jobs, and stop scheduler during 
> > job_timeout won't impact performance since in that scenario
> > There was already something wrong/stuck on that ring/scheduler
> > 
> > Thanks
> > 
> > ------------------------------------------
> > Monk Liu | Cloud-GPU Core team
> > ------------------------------------------
> > 
> > -----Original Message-----
> > From: Liu, Monk
> > Sent: Thursday, August 19, 2021 6:26 PM
> > To: Daniel Vetter <dan...@ffwll.ch>; Grodzovsky, Andrey 
> > <andrey.grodzov...@amd.com>
> > Cc: Alex Deucher <alexdeuc...@gmail.com>; Chen, JingWen 
> > <jingwen.ch...@amd.com>; Maling list - DRI developers 
> > <dri-de...@lists.freedesktop.org>; amd-gfx list 
> > <amd-gfx@lists.freedesktop.org>; Koenig, Christian 
> > <christian.koe...@amd.com>
> > Subject: RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad 
> > job."
> > 
> > [AMD Official Use Only]
> > 
> > Hi Daniel
> > 
> > > > Why can't we stop the scheduler thread first, so that there's 
> > > > guaranteed no race? I've recently had a lot of discussions with 
> > > > panfrost folks about their reset that spawns across engines, and 
> > > > without stopping the scheduler thread first before you touch anything 
> > > > it's just plain impossible.
> > Yeah we had this though as well in our mind.
> > 
> > Our second approach is to call ktrhead_stop() in job_timedout() routine so 
> > that  the "bad" job is guaranteed to be used without scheduler's touching 
> > or freeing, Check this sample patch one as well please:
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index a2a9536..50a49cb 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -319,17 +319,12 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> >          sched = container_of(work, struct drm_gpu_scheduler, 
> > work_tdr.work);
> >          /* Protects against concurrent deletion in 
> > drm_sched_get_cleanup_job */
> > +       kthread_park(sched->thread);
> >          spin_lock(&sched->job_list_lock);
> >          job = list_first_entry_or_null(&sched->pending_list,
> >                                         struct drm_sched_job, list);
> >          if (job) {
> > -               /*
> > -                * Remove the bad job so it cannot be freed by concurrent
> > -                * drm_sched_cleanup_jobs. It will be reinserted back after 
> > sched->thread
> > -                * is parked at which point it's safe.
> > -                */
> > -               list_del_init(&job->list);
> >                  spin_unlock(&sched->job_list_lock);
> >                  status = job->sched->ops->timedout_job(job);
> > @@ -345,6 +340,7 @@ static void drm_sched_job_timedout(struct work_struct 
> > *work)
> >          } else {
> >                  spin_unlock(&sched->job_list_lock);
> >          }
> > +       kthread_unpark(sched->thread);
> >          if (status != DRM_GPU_SCHED_STAT_ENODEV) {
> >                  spin_lock(&sched->job_list_lock); @@ -393,20 +389,6 @@ 
> > void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job 
> > *bad)
> >          kthread_park(sched->thread);
> >          /*
> > -        * Reinsert back the bad job here - now it's safe as
> > -        * drm_sched_get_cleanup_job cannot race against us and release the
> > -        * bad job at this point - we parked (waited for) any in progress
> > -        * (earlier) cleanups and drm_sched_get_cleanup_job will not be 
> > called
> > -        * now until the scheduler thread is unparked.
> > -        */
> > -       if (bad && bad->sched == sched)
> > -               /*
> > -                * Add at the head of the queue to reflect it was the 
> > earliest
> > -                * job extracted.
> > -                */
> > -               list_add(&bad->list, &sched->pending_list);
> > -
> > -       /*
> >           * Iterate the job list from later to  earlier one and either 
> > deactive
> >           * their HW callbacks or remove them from pending list if they 
> > already
> >           * signaled.
> > 
> > 
> > Thanks
> > 
> > ------------------------------------------
> > Monk Liu | Cloud-GPU Core team
> > ------------------------------------------
> > 
> > -----Original Message-----
> > From: Daniel Vetter <dan...@ffwll.ch>
> > Sent: Thursday, August 19, 2021 5:31 PM
> > To: Grodzovsky, Andrey <andrey.grodzov...@amd.com>
> > Cc: Daniel Vetter <dan...@ffwll.ch>; Alex Deucher <alexdeuc...@gmail.com>; 
> > Chen, JingWen <jingwen.ch...@amd.com>; Maling list - DRI developers 
> > <dri-de...@lists.freedesktop.org>; amd-gfx list 
> > <amd-gfx@lists.freedesktop.org>; Liu, Monk <monk....@amd.com>; Koenig, 
> > Christian <christian.koe...@amd.com>
> > Subject: Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad 
> > job."
> > 
> > On Wed, Aug 18, 2021 at 10:51:00AM -0400, Andrey Grodzovsky wrote:
> > > On 2021-08-18 10:42 a.m., Daniel Vetter wrote:
> > > > On Wed, Aug 18, 2021 at 10:36:32AM -0400, Andrey Grodzovsky wrote:
> > > > > On 2021-08-18 10:32 a.m., Daniel Vetter wrote:
> > > > > > On Wed, Aug 18, 2021 at 10:26:25AM -0400, Andrey Grodzovsky wrote:
> > > > > > > On 2021-08-18 10:02 a.m., Alex Deucher wrote:
> > > > > > > 
> > > > > > > > + dri-devel
> > > > > > > > 
> > > > > > > > Since scheduler is a shared component, please add dri-devel
> > > > > > > > on all scheduler patches.
> > > > > > > > 
> > > > > > > > On Wed, Aug 18, 2021 at 7:21 AM Jingwen Chen 
> > > > > > > > <jingwen.ch...@amd.com> wrote:
> > > > > > > > > [Why]
> > > > > > > > > for bailing job, this commit will delete it from pending
> > > > > > > > > list thus the bailing job will never have a chance to be
> > > > > > > > > resubmitted even in advance tdr mode.
> > > > > > > > > 
> > > > > > > > > [How]
> > > > > > > > > after embeded hw_fence into amdgpu_job is done, the race
> > > > > > > > > condition that this commit tries to work around is
> > > > > > > > > completely solved.So revert this commit.
> > > > > > > > > This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90.
> > > > > > > > > v2:
> > > > > > > > > add dma_fence_get/put() around timedout_job to avoid
> > > > > > > > > concurrent delete during processing timedout_job
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Jingwen Chen <jingwen.ch...@amd.com>
> > > > > > > > > ---
> > > > > > > > >      drivers/gpu/drm/scheduler/sched_main.c | 23 
> > > > > > > > > +++++------------------
> > > > > > > > >      1 file changed, 5 insertions(+), 18 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > index a2a953693b45..f9b9b3aefc4a 100644
> > > > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > > > @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct 
> > > > > > > > > work_struct *work)
> > > > > > > > >      {
> > > > > > > > >             struct drm_gpu_scheduler *sched;
> > > > > > > > >             struct drm_sched_job *job;
> > > > > > > > > +       struct dma_fence *fence;
> > > > > > > > >             enum drm_gpu_sched_stat status =
> > > > > > > > > DRM_GPU_SCHED_STAT_NOMINAL;
> > > > > > > > > 
> > > > > > > > >             sched = container_of(work, struct
> > > > > > > > > drm_gpu_scheduler, work_tdr.work); @@ -325,11 +326,10 @@
> > > > > > > > > static void drm_sched_job_timedout(struct work_struct
> > > > > > > > > *work)
> > > > > > > > > 
> > > > > > > > >             if (job) {
> > > > > > > > >                     /*
> > > > > > > > > -                * Remove the bad job so it cannot be freed 
> > > > > > > > > by concurrent
> > > > > > > > > -                * drm_sched_cleanup_jobs. It will be 
> > > > > > > > > reinserted back after sched->thread
> > > > > > > > > -                * is parked at which point it's safe.
> > > > > > > > > +                * Get job->s_fence->parent here to avoid 
> > > > > > > > > concurrent delete during
> > > > > > > > > +                * processing timedout_job
> > > > > > > > >                      */
> > > > > > > > > -               list_del_init(&job->list);
> > > > > > > > > +               fence =
> > > > > > > > > + dma_fence_get(job->s_fence->parent);
> > > > > > > While this is true for amdgpu, it has no meaning for other
> > > > > > > drivers for whom we haven't done the refactoring of embedding
> > > > > > > HW fence (parent) into the job structure.
> > > > > > > In fact thinking
> > > > > > > about it, unless you do the HW fence embedding for all the
> > > > > > > drivers using the scheduler you cannot revert this patch or
> > > > > > > you will just break them.
> > > > > > btw, why did you do that embedding? I do still have my patches
> > > > > > with dma_fence annotations floating around, but my idea at least
> > > > > > was to fix that issue with a mempool, not with embeddeding. What
> > > > > > was the motivation for embedding the wh fence?
> > > > > > -Daniel
> > > > > The motivation was 2 fold, avoid memory allocation during jobs
> > > > > submissions (HW fence allocation) because as Christian explained
> > > > > this leads to deadlock with mm code during evictions due to memory
> > > > > pressure (Christian can clarify if I messed
> > > > Yeah that's the exact same thing I've chased with my dma_fence
> > > > annotations, but thus far zero to none interested in getting it
> > > > sorted. I think it'd be good to have some cross-driver agreement on
> > > > how this should be solved before someone just charges ahead ...
> > > > 
> > > > > this explanation). Second is to exactly revert this patch because
> > > > > while it solved the issue described in the patch it created
> > > > > another with drivers who baildc out early during TDR handling for
> > > > > various reason and the job would just leak because it was already
> > > > > removed form pending list.
> > > > Can't we reinsert it before we restart the scheduler thread? It
> > > > might need a separate list for that due to the lockless queue
> > > > tricks. Or am I thinking about the wrong kind of "we lost the job"?
> > > > -Danile
> > > 
> > > If you look at the original patch it would reinsert it even earlier -
> > > right after stopping the  SW scheduler thread, and even then it was to
> > > late for some drivers as they would decide to return back from their
> > > TDR handler even before that. It is solvable but in an ugly way as far
> > > as I see, you need to require each driver in his code to put the job
> > > back in the list if they do it before reaching the place where
> > > scheduler framework does it. Kind of spaghetti code seems to me.
> > Hm yeah I didn't realize this all happens before we stop the scheduler 
> > thread.
> > 
> > Why can't we stop the scheduler thread first, so that there's guaranteed no 
> > race? I've recently had a lot of discussions with panfrost folks about 
> > their reset that spawns across engines, and without stopping the scheduler 
> > thread first before you touch anything it's just plain impossible.
> > 
> > I'm also still not understanding what exactly you guys have done, can 
> > someone please dig out the the amdgpu patches that motivate all this maybe 
> > that's clearer? A full explanation would still be good since I've only 
> > started in scheduler stuff.
> > 
> > Another thing I recently pondered for tdr races looking at i915 code is 
> > whether the tdr should first block the completion fence for that job. My 
> > motivation is to have a race-free error capture (if the completion races 
> > then we might start evicting memory and everything goes boom), but maybe 
> > that helps here too. Some kind of atomic "block this fence from completing 
> > thing.
> > 
> > Or I'm I completely guessing in the wrong direction?
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > > Andrey
> > > > > 
> > > > > 
> > > > > > > Andrey
> > > > > > > 
> > > > > > > 
> > > > > > > > >                     spin_unlock(&sched->job_list_lock);
> > > > > > > > > 
> > > > > > > > >                     status =
> > > > > > > > > job->sched->ops->timedout_job(job);
> > > > > > > > > @@ -342,6 +342,7 @@ static void drm_sched_job_timedout(struct 
> > > > > > > > > work_struct *work)
> > > > > > > > >                             job->sched->ops->free_job(job);
> > > > > > > > >                             sched->free_guilty = false;
> > > > > > > > >                     }
> > > > > > > > > +               dma_fence_put(fence);
> > > > > > > > >             } else {
> > > > > > > > >                     spin_unlock(&sched->job_list_lock);
> > > > > > > > >             }
> > > > > > > > > @@ -392,20 +393,6 @@ void drm_sched_stop(struct
> > > > > > > > > drm_gpu_scheduler *sched, struct drm_sched_job *bad)
> > > > > > > > > 
> > > > > > > > >             kthread_park(sched->thread);
> > > > > > > > > 
> > > > > > > > > -       /*
> > > > > > > > > -        * Reinsert back the bad job here - now it's safe as
> > > > > > > > > -        * drm_sched_get_cleanup_job cannot race against us 
> > > > > > > > > and release the
> > > > > > > > > -        * bad job at this point - we parked (waited for) any 
> > > > > > > > > in progress
> > > > > > > > > -        * (earlier) cleanups and drm_sched_get_cleanup_job 
> > > > > > > > > will not be called
> > > > > > > > > -        * now until the scheduler thread is unparked.
> > > > > > > > > -        */
> > > > > > > > > -       if (bad && bad->sched == sched)
> > > > > > > > > -               /*
> > > > > > > > > -                * Add at the head of the queue to reflect it 
> > > > > > > > > was the earliest
> > > > > > > > > -                * job extracted.
> > > > > > > > > -                */
> > > > > > > > > -               list_add(&bad->list, &sched->pending_list);
> > > > > > > > > -
> > > > > > > > >             /*
> > > > > > > > >              * Iterate the job list from later to  earlier 
> > > > > > > > > one and either deactive
> > > > > > > > >              * their HW callbacks or remove them from
> > > > > > > > > pending list if they already
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > > 
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cmonk.liu%40amd.com%7C27fcce7ca8dd4f39608508d962f40f33%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637649622657672189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JVZtg3AhbiA%2FDmVbNGo3MxVliO83nh8%2Fi50PCMsvwyY%3D&amp;reserved=0
> 

Reply via email to