On 1/15/26 15:08, Alex Deucher wrote:
>>>
>>> Explicit re-emit is the only way this can easily work correctly.  We
>>> save the ring state and and job state in the job and then we replay
>>> the state and re-emit a proper coherent packet stream after the reset.
>>> When we re-emit, we update the offsets as appropriate so that the
>>> logic works properly as you replay the job stream.  You can skip the
>>> IBs for the timedout context, but as long as the rest of the logic is
>>> there, everything works.  Saving and restoring the opaque ring
>>> contents is much harder because you need to either save a bunch of
>>> pointers or try and determine which offsets to patch, etc.
>>
>> Or you tell the HW to continue at the place you stopped excuting and before 
>> the reset and use the conditional execute all jobs are wrapped up in anyway 
>> to determine if they should execute or not or overwrite the commands with 
>> NOPs when for engines who don't use the conditional execute.
>>
> 
> Not all rings retain the contents of the ring after a reset

In that case I think we should not re-emit the work at all.

For example even if VRAM is not lost after a GPU reset the remaining state (VM 
etc..) is gone and can't be restored easily as far as I see.

We should absolutely not re-emit prending jobs in that case. It's basically 
just jepardy if that works or not.

> or may not
> be able to start at arbitrary ring ptr locations.

Every engine can do that, we just have to insert NOPs until you end up at the 
specific location.

> Plus only gfx and
> compute have conditional execution support.  For everything else you
> need to adjust the packet stream.

That is harmless, just overwrite with NOPs. That's certainly something all 
rings can do.

>> Re-emitting the command stream would only be necessary if we need to change 
>> the commands in anyway, and even if we would need to do that then I would 
>> say that we should not emit the commands again at all.
>>
> 
> The only case where we need to mess with anything is to support the
> set_q stuff and that is only supported on one gfx11 chip specifically
> for SR-IOV.
> 
>> I have patches in the pipeline to remove the job object from the reset path, 
>> so that we can free it up directly after submission again and completely 
>> solve all the lifetime issues we had with that.
>>
> 
> I don't really see any lifetime issues with the job after we fix the
> whole sched stop/start stuff.

It's not only the job object itself, but also all objects it eventually points 
to.

For example job->vm is invalid after you initially emitted the job and exactly 
that has been caused issues tons of times.

> Moreover, having the job around (or we
> could hang the state on the fence, but that is less clean because
> there are potentially two fences per job that you need to keep track
> of that share common state) makes it much easier to re-emit the packet
> stream after a reset.  It's a lot easier to just call the emit
> functions on a clean ring than to deal with opaque ring contents.
> Depending on the ring you end up needing to keep lots of pointers to
> mark fences and job boundaries.  Then if you have to re-emit the same
> job multiple times, you have to re-adjust all of the pointers, plus
> deal with skipping the IBs while still emitting the fences.

And exactly that is what I'm trying to prevent. Emitting jobs multiple times 
was an extremely bad idea.

I can't count how many hours we have spend over the last 10 years just to try 
to get that working and we still have the same problems we had at the beginning.

So I absolutely see no change that this will change.

Christian.

> 
> Alex
> 
>> Re-emitting completely breaks that again.
>>
>> Christian.
>>
>>>
>>> Alex
>>>
>>>> Long story short that is seriously not going to work. So absolutely clear 
>>>> NAK from my side to this approach.
>>>>
>>>> What we could do to avoid problems and patching pointers in the command 
>>>> stream is to emit only the fence signaling for skipped jobs and fill 
>>>> everything else with NOPs.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>  At that point the ring has the
>>>>> same state it had before the queue was reset and the state gets
>>>>> updated in the ring as the IBs are reemitted.
>>>>>
>>>>> That's it.  The only other state dependent on the ring was the seq
>>>>> number to wait on for pipeline sync and I fixed that by making it
>>>>> explicit.
>>>>>
>>>>> Alex
>>>>>
>>>>>>>
>>>>>>>> If the relevant state is
>>>>>>>> stored in the job, you can re-emit it and get the same ring state each
>>>>>>>> time.
>>>>>>>
>>>>>>> No, you can't. Background is that the relevant state is not job 
>>>>>>> dependent, but inter job dependent.
>>>>>>>
>>>>>>> In other words it doesn't depend on what job is executing now but 
>>>>>>> rather which one was executed right before that one.
>>>>>>>
>>>>>>> Or even worse in the case of the set_q_mode packet on the job dependent 
>>>>>>> after the one you want to execute.
>>>>>>>
>>>>>>> I can absolutely not see how stuff like that should work with 
>>>>>>> re-submission.
>>>>>>
>>>>>> All you need to do is save the state that was used to emit the packets
>>>>>> in the original submission.
>>>>>>
>>>>>>>
>>>>>>>> If you end up with multiple queue resets in a row, it gets
>>>>>>>> really complex to try and save and restore opaque ring contents.  By
>>>>>>>> the time you fix up the state tracking to handle that, you end up
>>>>>>>> pretty close to this solution.
>>>>>>>
>>>>>>> Not even remotely, you have tons of state we would need to save and 
>>>>>>> restore and a lot of that is outside of the job.
>>>>>>>
>>>>>>> Updating a few fence pointers on re-submission is absolutely trivial 
>>>>>>> compared to that.
>>>>>>
>>>>>> It's not that easy.  If you want to just emit the fences for bad
>>>>>> contexts rather than the whole IB stream, you can also potentially
>>>>>> mess up the ring state.  You'd end up needing a pile of pointers that
>>>>>> need to be recalculated on every reset to try and remit the
>>>>>> appropriate state again.  This approach also paves the way for
>>>>>> re-emitting state for all queues after adapter reset when VRAM is not
>>>>>> lost.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> On 1/8/26 15:48, Alex Deucher wrote:
>>>>>>>>>> This set contains a number of bug fixes and cleanups for
>>>>>>>>>> IB handling that I worked on over the holidays.
>>>>>>>>>>
>>>>>>>>>> Patches 1-2:
>>>>>>>>>> Simple bug fixes.
>>>>>>>>>>
>>>>>>>>>> Patches 3-26:
>>>>>>>>>> Removes the direct submit path for IBs and requires
>>>>>>>>>> that all IB submissions use a job structure.  This
>>>>>>>>>> greatly simplifies the IB submission code.
>>>>>>>>>>
>>>>>>>>>> Patches 27-42:
>>>>>>>>>> Split IB state setup and ring emission.  This keeps all
>>>>>>>>>> of the IB state in the job.  This greatly simplifies
>>>>>>>>>> re-emission of non-timed-out jobs after a ring reset and
>>>>>>>>>> allows for re-emission multiple times if multiple resets
>>>>>>>>>> happen in a row.  It also properly handles the dma fence
>>>>>>>>>> error handling for timedout jobs with adapter resets.
>>>>>>>>>>
>>>>>>>>>> Alex Deucher (42):
>>>>>>>>>>   drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check
>>>>>>>>>>   drm/amdgpu: fix error handling in ib_schedule()
>>>>>>>>>>   drm/amdgpu: add new job ids
>>>>>>>>>>   drm/amdgpu/vpe: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx6: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx7: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx8: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx9: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx9.4.2: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx9.4.3: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx10: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx11: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx12: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/gfx12.1: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/si_dma: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/cik_sdma: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma2.4: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma3: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma4: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma4.4.2: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma5: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma5.2: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma6: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma7: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu/sdma7.1: switch to using job for IBs
>>>>>>>>>>   drm/amdgpu: require a job to schedule an IB
>>>>>>>>>>   drm/amdgpu: mark fences with errors before ring reset
>>>>>>>>>>   drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()
>>>>>>>>>>   drm/amdgpu: don't call drm_sched_stop/start() in asic reset
>>>>>>>>>>   drm/amdgpu: drop drm_sched_increase_karma()
>>>>>>>>>>   drm/amdgpu: plumb timedout fence through to force completion
>>>>>>>>>>   drm/amdgpu: change function signature for emit_pipeline_sync()
>>>>>>>>>>   drm/amdgpu: drop extra parameter for vm_flush
>>>>>>>>>>   drm/amdgpu: move need_ctx_switch into amdgpu_job
>>>>>>>>>>   drm/amdgpu: store vm flush state in amdgpu_job
>>>>>>>>>>   drm/amdgpu: split fence init and emit logic
>>>>>>>>>>   drm/amdgpu: split vm flush and vm flush emit logic
>>>>>>>>>>   drm/amdgpu: split ib schedule and ib emit logic
>>>>>>>>>>   drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout()
>>>>>>>>>>   drm/amdgpu: add an all_instance_rings_reset ring flag
>>>>>>>>>>   drm/amdgpu: rework reset reemit handling
>>>>>>>>>>   drm/amdgpu: simplify per queue reset code
>>>>>>>>>>
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |   2 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |   2 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  13 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 136 +++------
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c      | 289 
>>>>>>>>>> ++++++++++----------
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  40 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |  13 +
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  67 -----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |  37 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c    |   4 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c     |   2 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c     |  21 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 141 +++++-----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c     |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c       |  36 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c      |  41 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c      |  41 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c      |  41 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c      |  33 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c       |  28 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c       |  30 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c       | 143 +++++-----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c       | 149 +++++-----
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c     |  26 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c     |  38 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c      |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c    |   6 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c    |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c      |  43 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c      |  43 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c      |  43 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c    |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c      |  46 ++--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c      |  45 +--
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/si_dma.c         |  34 ++-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c       |   8 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vce_v3_0.c       |   4 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c       |   2 +
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c       |   2 +
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c       |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c     |   4 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c     |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c     |   3 +-
>>>>>>>>>>  drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c     |   4 +-
>>>>>>>>>>  54 files changed, 952 insertions(+), 966 deletions(-)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
>>

Reply via email to