On Mon, Jul 28, 2025 at 10:41 PM YuanShang Mao (River)
<yuanshang....@amd.com> wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi Alexander
>
> > I didn't think we actually resubmitting jobs anymore.  How are we ending up 
> > trying to resubmit in the first place?
>
> It is not about resubmitting. amdgpu_vm_generation will check if the VM is 
> valid for this job to use. For example, if a gfx job depends on an sdma job, 
> which is already cancelled, then the gfx job should be skipped.

Ah, that makes sense.  Can you include that in the patch description?
With that included, the patch is
Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>

> Perhaps the dependencies between tasks should be resolved by the drm instead 
> of amdgpu.

If we can do that or check the dependencies via the job itself that
would be better in the long term.

Alex

>
> uint64_t amdgpu_vm_generation(struct amdgpu_device *adev, struct amdgpu_vm 
> *vm)
> {
>         uint64_t result = (u64)atomic_read(&adev->vram_lost_counter) << 32;
>
>         if (!vm)
>                 return result;
>
>         result += lower_32_bits(vm->generation);
>         /* Add one if the page tables will be re-generated on next CS */
>         if (drm_sched_entity_error(&vm->delayed))
>                 ++result;
>
>         return result;
> }
>
> Thanks
> River
>
>
> -----Original Message-----
> From: Alex Deucher <alexdeuc...@gmail.com>
> Sent: Tuesday, July 29, 2025 2:10 AM
> To: YuanShang Mao (River) <yuanshang....@amd.com>
> Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Koenig, Christian 
> <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org; cao, lin 
> <lin....@amd.com>; Zhang, Tiantian (Celine) <tiantian.zh...@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job
>
> On Mon, Jul 28, 2025 at 5:01 AM YuanShang Mao (River) <yuanshang....@amd.com> 
> wrote:
> >
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > Hi Alexander
> >
> >         Since Christian is on vacation. Could you help review the below 
> > patch?
> >         If set job->vm to null in amdgpu_job_prepare_job, the job which 
> > should be skipped in amdgpu_job_run will be submitted unexpectedly.
>
> I think we can just remove the amdgpu_vm_generation() check in 
> amdgpu_job_run().  I didn't think we actually resubmitting jobs anymore.  How 
> are we ending up trying to resubmit in the first place?
>
> Alex
>
> >
> > Thanks
> > River
> >
> >
> > -----Original Message-----
> > From: YuanShang Mao (River) <yuanshang....@amd.com>
> > Sent: Wednesday, July 23, 2025 5:06 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Koenig, Christian <christian.koe...@amd.com>; YuanShang Mao
> > (River) <yuanshang....@amd.com>
> > Subject: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job
> >
> > job->vm is used in function amdgpu_job_run to get the page
> > table re-generation counter and decide whether the job should be skipped.
> >
> > Signed-off-by: YuanShang <yuanshang....@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 -------
> >  1 file changed, 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index 45febdc2f349..18998343815e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -360,13 +360,6 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
> >                         dev_err(ring->adev->dev, "Error getting VM ID 
> > (%d)\n", r);
> >                         goto error;
> >                 }
> > -               /*
> > -                * The VM structure might be released after the VMID is
> > -                * assigned, we had multiple problems with people trying to 
> > use
> > -                * the VM pointer so better set it to NULL.
> > -                */
> > -               if (!fence)
> > -                       job->vm = NULL;
> >                 return fence;
> >         }
> >
> > --
> > 2.25.1
> >

Reply via email to