On Mon, Jul 28, 2025 at 10:41 PM YuanShang Mao (River) <yuanshang....@amd.com> wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi Alexander > > > I didn't think we actually resubmitting jobs anymore. How are we ending up > > trying to resubmit in the first place? > > It is not about resubmitting. amdgpu_vm_generation will check if the VM is > valid for this job to use. For example, if a gfx job depends on an sdma job, > which is already cancelled, then the gfx job should be skipped.
Ah, that makes sense. Can you include that in the patch description? With that included, the patch is Reviewed-by: Alex Deucher <alexander.deuc...@amd.com> > Perhaps the dependencies between tasks should be resolved by the drm instead > of amdgpu. If we can do that or check the dependencies via the job itself that would be better in the long term. Alex > > uint64_t amdgpu_vm_generation(struct amdgpu_device *adev, struct amdgpu_vm > *vm) > { > uint64_t result = (u64)atomic_read(&adev->vram_lost_counter) << 32; > > if (!vm) > return result; > > result += lower_32_bits(vm->generation); > /* Add one if the page tables will be re-generated on next CS */ > if (drm_sched_entity_error(&vm->delayed)) > ++result; > > return result; > } > > Thanks > River > > > -----Original Message----- > From: Alex Deucher <alexdeuc...@gmail.com> > Sent: Tuesday, July 29, 2025 2:10 AM > To: YuanShang Mao (River) <yuanshang....@amd.com> > Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Koenig, Christian > <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org; cao, lin > <lin....@amd.com>; Zhang, Tiantian (Celine) <tiantian.zh...@amd.com> > Subject: Re: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job > > On Mon, Jul 28, 2025 at 5:01 AM YuanShang Mao (River) <yuanshang....@amd.com> > wrote: > > > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > Hi Alexander > > > > Since Christian is on vacation. Could you help review the below > > patch? > > If set job->vm to null in amdgpu_job_prepare_job, the job which > > should be skipped in amdgpu_job_run will be submitted unexpectedly. > > I think we can just remove the amdgpu_vm_generation() check in > amdgpu_job_run(). I didn't think we actually resubmitting jobs anymore. How > are we ending up trying to resubmit in the first place? > > Alex > > > > > Thanks > > River > > > > > > -----Original Message----- > > From: YuanShang Mao (River) <yuanshang....@amd.com> > > Sent: Wednesday, July 23, 2025 5:06 PM > > To: amd-gfx@lists.freedesktop.org > > Cc: Koenig, Christian <christian.koe...@amd.com>; YuanShang Mao > > (River) <yuanshang....@amd.com> > > Subject: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job > > > > job->vm is used in function amdgpu_job_run to get the page > > table re-generation counter and decide whether the job should be skipped. > > > > Signed-off-by: YuanShang <yuanshang....@amd.com> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 ------- > > 1 file changed, 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > index 45febdc2f349..18998343815e 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > > @@ -360,13 +360,6 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job, > > dev_err(ring->adev->dev, "Error getting VM ID > > (%d)\n", r); > > goto error; > > } > > - /* > > - * The VM structure might be released after the VMID is > > - * assigned, we had multiple problems with people trying to > > use > > - * the VM pointer so better set it to NULL. > > - */ > > - if (!fence) > > - job->vm = NULL; > > return fence; > > } > > > > -- > > 2.25.1 > >