Hi Fab,

Thanks for reaching out.

On the crash you linked (gitlab #443): I had a careful look and I don't
think this series will fix that one. The oops is in nouveau_fence_sync(),
on the TTM buffer-eviction / nouveau_bo_move() path, hit while a renderer
process allocates a GEM buffer and the IGP runs out of instance memory
(the "imem: OOM ... -28" line right before it). My patches are all
elsewhere:

  1/3 swaps the MSI re-arm method (interrupt delivery / FIFO stability).
  2/3 adds a NULL check in nv50_sor_atomic_disable() (display-encoder
      teardown). That is also a NULL deref, but in the modeset path, not
      the fence/BO path, so it is a different bug.
  3/3 retries a DisplayPort link check on an HPD IRQ.

None of them touch nouveau_fence.c or nouveau_bo.c, so they won't help
that specific crash, and I didn't want to give you false hope. For #443
itself, the fence/eviction code was reworked quite a bit after 6.12
(around 6.15/6.16), so the most useful next step is probably to check
whether a current kernel (6.15+ or 6.18) still reproduces it, and if so
attach a fresh oops to the gitlab issue. That part of the driver is
outside what I work on, so the maintainers there are better placed than I
am to chase it.

Where the series might actually help you is if you also see any of these,
which is exactly what it targets:

  - sporadic FIFO errors / hangs that you work around with NvMSI=0
    (nouveau.config=NvMSI=0)                  -> patch 1/3
  - a kernel oops in nv50_sor_atomic_disable() when ending a Wayland
    session or switching VTs                  -> patch 2/3
  - DisplayPort flicker or a brief blackout after DPMS / monitor wake
                                              -> patch 3/3

It was developed on 6.18 but does not depend on anything 6.18-only, so it
is easy to try on 6.12:

  - Patches 2/3 and 3/3 apply to 6.12 as-is.
  - Patch 1/3 needs a one-line manual edit, because the g94_pci_func
    struct was reorganised between 6.12 and 6.18. The change is identical:
    in drivers/gpu/drm/nouveau/nvkm/subdev/pci/g94.c set
        .msi_rearm = nv46_pci_msi_rearm,
    instead of nv40_pci_msi_rearm. nv46_pci_msi_rearm already exists in
    6.12, so that single line is the whole change.

If you give them a try and they help with the MSI or display symptoms, I
would be glad to hear back. A Tested-by on the list would also genuinely
help the case for getting these merged.

Regards,
Marek

(Disclosure: drafted with help from an AI assistant, Claude; conclusions
are mine and verified.)

Reply via email to