Currently the GSP is left running and the WPR2 memory region untouched when the driver is unbound. This is obviously not ideal for at least two reasons:
- Probing requires setting up the WPR2 region, which cannot be done if there is already one in place. Hence the current requirement to reset the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before the driver can be probed again after removal. - The running GSP may still attempt to access shared memory regions which the kernel might recycle. On top of that, there is a nasty bug in the Blackwell VBIOS that sometimes borks the GPU upon PCI reset, requiring a reboot. So relying on the PCI reset to unload/reload Nova is really not practical here. This series does what is needed to leave the GPU in a clean state after unbind, for all currently supported GPUs. Blackwell support is basic and will be added alongside the Blackwell series if this can be merged first. This revision rebases on top of the Device HRT series [1] and addresses the minor feedback received on v4. A branch with the series and its required dependencies is available at [2]. [1] https://lore.kernel.org/[email protected] [2] https://github.com/Gnurou/linux/tree/b4/nova-unload Signed-off-by: Alexandre Courbot <[email protected]> --- Changes in v5: - Rebase on top of the Device HRT series. - Drop the now unneeded "gpu: nova-core: split BAR acquisition in unbind()". - Link to v4: https://patch.msgid.link/[email protected] Changes in v4: - Remove `warn_on_err` macro as it isn't performing as expected and distracts from the goal of the series. - Add John's patch from the Blackwell series refactoring the Booter Loader runner code. - Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it in preparation for the Hopper/Blackwell FSP boot path. - Prepare the firmware required for unloading at probe time and save it into an unload bundle, as we cannot guarantee filesystem access at unload time. - Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module. - Also write the sentinel value `0xff` into `mbox1` when running Booter Unloader to align with OpenRM. - Link to v3: https://patch.msgid.link/[email protected] Changes in v3: - Disambiguate doccomment for `warn_on_err`. - Test the correct bit instead of the whole register value to determine that the GSP has stopped. - Use an enum instead of a boolean to encode the power level when shutting down the GSP. - Add missing newline to `dev_err`. - Add missing doccomments for new types. - Use values from bindings instead of magic numbers. - Remove the redundant `get_gsp_info` function. - Better document Booter Unloader mailbox sentinel value, and check the value of mbox0 upon return. - Link to v2: https://patch.msgid.link/[email protected] Changes in v2: - Rebase on top of `master` and remove unneeded/obsolete preparatory patches. - Tidy up the imports of commands from the `fw` module in the `gsp` module. - Link to v1: https://patch.msgid.link/[email protected] --- Alexandre Courbot (6): gpu: nova-core: remove unneeded get_gsp_info proxy function gpu: nova-core: do not import firmware commands into GSP command module gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding John Hubbard (1): gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run() drivers/gpu/nova-core/driver.rs | 4 + drivers/gpu/nova-core/firmware/booter.rs | 31 +- drivers/gpu/nova-core/firmware/fwsec.rs | 1 - drivers/gpu/nova-core/gpu.rs | 7 + drivers/gpu/nova-core/gsp.rs | 4 + drivers/gpu/nova-core/gsp/boot.rs | 252 +++++----------- drivers/gpu/nova-core/gsp/commands.rs | 71 +++-- drivers/gpu/nova-core/gsp/fw.rs | 4 + drivers/gpu/nova-core/gsp/fw/commands.rs | 44 +++ drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 + drivers/gpu/nova-core/gsp/hal.rs | 92 ++++++ drivers/gpu/nova-core/gsp/hal/gh100.rs | 52 ++++ drivers/gpu/nova-core/gsp/hal/tu102.rs | 351 ++++++++++++++++++++++ drivers/gpu/nova-core/regs.rs | 5 + 14 files changed, 736 insertions(+), 193 deletions(-) --- base-commit: 84d984f9fe9363f4700e20f7c95b2da67fb2fe63 change-id: 20251216-nova-unload-4029b3b76950 Best regards, -- Alexandre Courbot <[email protected]>
