[AMD Official Use Only - AMD Internal Distribution Only] Hi Alexander
> I didn't think we actually resubmitting jobs anymore. How are we ending up > trying to resubmit in the first place? It is not about resubmitting. amdgpu_vm_generation will check if the VM is valid for this job to use. For example, if a gfx job depends on an sdma job, which is already cancelled, then the gfx job should be skipped. Perhaps the dependencies between tasks should be resolved by the drm instead of amdgpu. uint64_t amdgpu_vm_generation(struct amdgpu_device *adev, struct amdgpu_vm *vm) { uint64_t result = (u64)atomic_read(&adev->vram_lost_counter) << 32; if (!vm) return result; result += lower_32_bits(vm->generation); /* Add one if the page tables will be re-generated on next CS */ if (drm_sched_entity_error(&vm->delayed)) ++result; return result; } Thanks River -----Original Message----- From: Alex Deucher <alexdeuc...@gmail.com> Sent: Tuesday, July 29, 2025 2:10 AM To: YuanShang Mao (River) <yuanshang....@amd.com> Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org; cao, lin <lin....@amd.com>; Zhang, Tiantian (Celine) <tiantian.zh...@amd.com> Subject: Re: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job On Mon, Jul 28, 2025 at 5:01 AM YuanShang Mao (River) <yuanshang....@amd.com> wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi Alexander > > Since Christian is on vacation. Could you help review the below patch? > If set job->vm to null in amdgpu_job_prepare_job, the job which > should be skipped in amdgpu_job_run will be submitted unexpectedly. I think we can just remove the amdgpu_vm_generation() check in amdgpu_job_run(). I didn't think we actually resubmitting jobs anymore. How are we ending up trying to resubmit in the first place? Alex > > Thanks > River > > > -----Original Message----- > From: YuanShang Mao (River) <yuanshang....@amd.com> > Sent: Wednesday, July 23, 2025 5:06 PM > To: amd-gfx@lists.freedesktop.org > Cc: Koenig, Christian <christian.koe...@amd.com>; YuanShang Mao > (River) <yuanshang....@amd.com> > Subject: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job > > job->vm is used in function amdgpu_job_run to get the page > table re-generation counter and decide whether the job should be skipped. > > Signed-off-by: YuanShang <yuanshang....@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 ------- > 1 file changed, 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > index 45febdc2f349..18998343815e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > @@ -360,13 +360,6 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job, > dev_err(ring->adev->dev, "Error getting VM ID > (%d)\n", r); > goto error; > } > - /* > - * The VM structure might be released after the VMID is > - * assigned, we had multiple problems with people trying to > use > - * the VM pointer so better set it to NULL. > - */ > - if (!fence) > - job->vm = NULL; > return fence; > } > > -- > 2.25.1 >