Currently the GSP is left running and the WPR2 memory region untouched when the driver is unbound. This is obviously not ideal for at least two reasons:
- Probing requires setting up the WPR2 region, which cannot be done if there is already one in place. Hence the current requirement to reset the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before the driver can be probed again after removal. - The running GSP may still attempt to access shared memory regions which the kernel might recycle. On top of that, there is a nasty bug in the Blackwell VBIOS that sometimes borks the GPU upon PCI reset, requiring a reboot. So relying on the PCI reset to unload/reload Nova is really not practical here. This series does what is needed to leave the GPU in a clean state after unbind, for all currently supported GPUs. Blackwell support is just a placeholder and will be completed by the Blackwell boot support series. This revision is based on `drm-rust-next`. A branch with the series is available at [1]. [1] https://github.com/Gnurou/linux/tree/b4/nova-unload Signed-off-by: Alexandre Courbot <[email protected]> --- Changes in v7: - Rebase on current drm-rust-next. - Drop merged patches. - Integrate Eliot's unload-on-drop improvement. - Use `&Gsp` instead of `Pin<&mut Gsp>` in HAL. - Add a new patch that runs the unload bundle if `Gsp::boot` fails. - Link to v6: https://patch.msgid.link/[email protected] Changes in v6: - Inline TU102 local `run_booter` method in its unique call site. - Rename unload bundle field to `unload_bundle`. - Make Sec2UnloadBundle private. - Continue GSP teardown upon partial failure. - Store the unload bundle into `NovaCore`. - Take the unload bundle by value to make it one-shot. - Link to v5: https://patch.msgid.link/[email protected] Changes in v5: - Rebase on top of the Device HRT series. - Drop the now unneeded "gpu: nova-core: split BAR acquisition in unbind()". - Link to v4: https://patch.msgid.link/[email protected] Changes in v4: - Remove `warn_on_err` macro as it isn't performing as expected and distracts from the goal of the series. - Add John's patch from the Blackwell series refactoring the Booter Loader runner code. - Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it in preparation for the Hopper/Blackwell FSP boot path. - Prepare the firmware required for unloading at probe time and save it into an unload bundle, as we cannot guarantee filesystem access at unload time. - Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module. - Also write the sentinel value `0xff` into `mbox1` when running Booter Unloader to align with OpenRM. - Link to v3: https://patch.msgid.link/[email protected] Changes in v3: - Disambiguate doccomment for `warn_on_err`. - Test the correct bit instead of the whole register value to determine that the GSP has stopped. - Use an enum instead of a boolean to encode the power level when shutting down the GSP. - Add missing newline to `dev_err`. - Add missing doccomments for new types. - Use values from bindings instead of magic numbers. - Remove the redundant `get_gsp_info` function. - Better document Booter Unloader mailbox sentinel value, and check the value of mbox0 upon return. - Link to v2: https://patch.msgid.link/[email protected] Changes in v2: - Rebase on top of `master` and remove unneeded/obsolete preparatory patches. - Tidy up the imports of commands from the `fw` module in the `gsp` module. - Link to v1: https://patch.msgid.link/[email protected] --- Alexandre Courbot (4): gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails drivers/gpu/nova-core/firmware/booter.rs | 1 - drivers/gpu/nova-core/firmware/fwsec.rs | 1 - drivers/gpu/nova-core/gpu.rs | 34 ++- drivers/gpu/nova-core/gsp.rs | 4 + drivers/gpu/nova-core/gsp/boot.rs | 290 +++++++++--------- drivers/gpu/nova-core/gsp/commands.rs | 43 +++ drivers/gpu/nova-core/gsp/fw.rs | 4 + drivers/gpu/nova-core/gsp/fw/commands.rs | 45 +++ drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 + drivers/gpu/nova-core/gsp/hal.rs | 94 ++++++ drivers/gpu/nova-core/gsp/hal/gh100.rs | 51 ++++ drivers/gpu/nova-core/gsp/hal/tu102.rs | 349 ++++++++++++++++++++++ drivers/gpu/nova-core/regs.rs | 5 + 13 files changed, 775 insertions(+), 157 deletions(-) --- base-commit: 0e42ec83d46ab8877d38d37493328ed7d1a24de8 change-id: 20251216-nova-unload-4029b3b76950 Best regards, -- Alexandre Courbot <[email protected]>
