Hi Fab,
Thanks for reaching out.
On the crash you linked (gitlab #443): I had a careful look and I don't
think this series will fix that one. The oops is in nouveau_fence_sync(),
on the TTM buffer-eviction / nouveau_bo_move() path, hit while a renderer
process allocates a GEM buffer and the IGP runs out of instance memory
(the "imem: OOM ... -28" line right before it). My patches are all
elsewhere:
1/3 swaps the MSI re-arm method (interrupt delivery / FIFO stability).
2/3 adds a NULL check in nv50_sor_atomic_disable() (display-encoder
teardown). That is also a NULL deref, but in the modeset path, not
the fence/BO path, so it is a different bug.
3/3 retries a DisplayPort link check on an HPD IRQ.
None of them touch nouveau_fence.c or nouveau_bo.c, so they won't help
that specific crash, and I didn't want to give you false hope. For #443
itself, the fence/eviction code was reworked quite a bit after 6.12
(around 6.15/6.16), so the most useful next step is probably to check
whether a current kernel (6.15+ or 6.18) still reproduces it, and if so
attach a fresh oops to the gitlab issue. That part of the driver is
outside what I work on, so the maintainers there are better placed than I
am to chase it.
Where the series might actually help you is if you also see any of these,
which is exactly what it targets:
- sporadic FIFO errors / hangs that you work around with NvMSI=0
(nouveau.config=NvMSI=0) -> patch 1/3
- a kernel oops in nv50_sor_atomic_disable() when ending a Wayland
session or switching VTs -> patch 2/3
- DisplayPort flicker or a brief blackout after DPMS / monitor wake
-> patch 3/3
It was developed on 6.18 but does not depend on anything 6.18-only, so it
is easy to try on 6.12:
- Patches 2/3 and 3/3 apply to 6.12 as-is.
- Patch 1/3 needs a one-line manual edit, because the g94_pci_func
struct was reorganised between 6.12 and 6.18. The change is identical:
in drivers/gpu/drm/nouveau/nvkm/subdev/pci/g94.c set
.msi_rearm = nv46_pci_msi_rearm,
instead of nv40_pci_msi_rearm. nv46_pci_msi_rearm already exists in
6.12, so that single line is the whole change.
If you give them a try and they help with the MSI or display symptoms, I
would be glad to hear back. A Tested-by on the list would also genuinely
help the case for getting these merged.
Regards,
Marek
(Disclosure: drafted with help from an AI assistant, Claude; conclusions
are mine and verified.)