The current GFX IRQ handling acquires the priv_reg / priv_inst /
bad_op and (on v9) cp_ecc_error_irq IRQ refs in late_init but
releases them in hw_fini.  hw_fini runs on paths that may not reach
late_init (SR-IOV VF skips the cp_ecc_error_irq get, an earlier IP
init failure short-circuits late_init, etc.), so the puts have to be
defensively guarded with amdgpu_irq_enabled().

Pair the gets and puts properly:

  - Patch 1 adds a per-block ras_suspend callback and uses it to
    move cp_ecc_error_irq's release into amdgpu_gfx_ras_suspend() /
    amdgpu_gfx_ras_fini().

  - Patch 2 moves the remaining priv_reg / priv_inst / bad_op and
    userq EOP IRQs from late_init to hw_init across gfx9, gfx9_4_3,
    gfx10, gfx11, gfx12_0 and gfx12_1, and drops the now-unnecessary
    amdgpu_irq_enabled() guards.  It also fixes a pre-existing
    partial-failure leak in set_userq_eop_interrupts().

Follow-up to https://patchwork.freedesktop.org/patch/728675/.

Yunxiang Li (2):
  drm/amdgpu/ras: add ras_suspend callback and use it for cp_ecc_error_irq
  drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_fini

Reply via email to