Hi everyone, After discussion, and since the `num` module seems to be taking more time to reach consensus than the rest of this series, I have split it into its own patch series and use ad-hoc code in Nova (only a handful of places thankfully) for now that will be replaced by the `num` patch series. This will also allow it to maybe get more attention as it was until now buried inside a loosely-related patch series.
This also includes an important fix for a bug discovered by Ben Skeggs in the falcon code: the bit indicating the completion of memory scrubbing was interpreted incorrectly, which created a race condition that could result in a failure to boot the GSP. :O Other than that, a few more minor refinements took place, but nothing that changes this series considerably. The last patch tries to organize the increasing number of TODO items we have in the code; until they can be addressed, it would be nice to understand which task in `todo.rst` they correspond to, so I took the freedom to annotate them all to that effect. Usual disclaimer: this series currently only successfully probes Ampere GPUs, and does not allow the GPU to do anything useful yet. Upon successful probe, the driver will only display the range of the WPR2 region constructed by FWSEC-FRTS with debug priority: [ 95.436000] NovaCore 0000:01:00.0: WPR2: 0xffc00000-0xffce0000 [ 95.436002] NovaCore 0000:01:00.0: GPU instance built This series is based on v6.16-rc1 with no other dependencies. There are bits of documentation still missing, these are addressed by Joel in his own documentation patch series [1]. I'll also double-check and send follow-up patches if anything is still missing after that. [1] https://lore.kernel.org/rust-for-linux/20250503040802.1411285-1-joelagn...@nvidia.com/ Signed-off-by: Alexandre Courbot <acour...@nvidia.com> --- Changes in v6: - Add `dma_handle_with_offset` method to CoherentAllocation. - Move the `num` module into its own patchset and use ad-hoc code for now. - Add new items (and remove obsolete ones) to the TODO tag `TODO` entries in the code with their corresponding task in the list. - Add `TIMEOUT:` comments wherever a timeout is used. - Fix bug while waiting for falcon mem scrubbing to finish (thanks Ben Skeggs!) - Pass the firwmare object instead of its DMA handle in `dma_wr`. - Fix safety statements in `fwsec.rs`. - Move FWSEC boot code to `FwsecFirmware` and a helper function of `Gpu` to simplify `Gpu::new`. - Add helper methods to NV_PFB_PRI_MMU_WPR2_ADDR_* to obtain the exact address. - Fix build errors and warnings with Rust 1.78. - Link to v5: https://lore.kernel.org/r/20250612-nova-frts-v5-0-14ba7eaf1...@nvidia.com Changes in v5: - Rebased on top of 6.16-rc1. - Improve invariants of CoherentAllocation related to the new `size` method. - Use SZ_* consts when redefining BAR0 size. - Split VBIOS patch into 3 patches (Joel) - Convert all `Result<()>` into `Result`. - Use `::cast<T>()` instead of ` as ` to convert pointer types. - Use `KBox` instead of `Arc` for falcon HALs. - Do not use `get_` prefix on methods that do not increase reference count. - Replace arbitrary immediate values with proper constants. - Use EIO to indicate firmware errors. - Use inspect_err to be more verbose on which step of the FWSEC setup failed. - Move sysmem flush page into its own type and add its registration to the FB HAL. - Turn HAL getters into standalone functions. - Patch FWSEC command at construction time. - Force the signing stage (or an explicit non-signing state transition) on the firmware DMA objects. - Link to v4: https://lore.kernel.org/r/20250521-nova-frts-v4-0-05dfd4f39...@nvidia.com Changes in v4: - Improve documentation of falcon security modes (thanks Joel!) - Add the definition of the size of CoherentAllocation as one of its invariants. - Better document GFW boot progress, registers and use wait_on() helper, and move it to `gfw` module instead of `devinit`. - Add missing TODOs for workarounds waiting to be replaced by in-flight R4L features. - Register macro: add the offset of the register as a type constant, and allow register aliases for registers which can be interpreted differently depending on context. - Rework the `num` module using only macros (to allow use of overflowing ops), and add the `PowerOfTwo` type. - Add a proper HAL to the `fb` module. - Move HAL builders to impl blocks of Chipset. - Add proper types and traits for signatures. - Proactively split FalconFirmware into distinct traits to ease management of v2 vs v3 FWSEC headers that will be needed for Turing support. - Link to v3: https://lore.kernel.org/r/20250507-nova-frts-v3-0-fcb027497...@nvidia.com Changes in v3: - Rebased on top of latest nova-next. - Use the new Devres::access() and remove the now unneeded with_bar!() macro. - Dropped `rust: devres: allow to borrow a reference to the resource's Device` as it is not needed anymore. - Fixed more erroneous uses of `ERANGE` error. - Optimized alignment computations of the FB layout a bit. - Link to v2: https://lore.kernel.org/r/20250501-nova-frts-v2-0-b4a137175...@nvidia.com Changes in v2: - Rebased on latest nova-next. - Fixed all clippy warnings. - Added `count` and `size` methods to `CoherentAllocation`. - Added method to obtain a reference to the `Device` from a `Devres` (this is super convenient). - Split `DmaObject` into its own patch and added `Deref` implementation. - Squashed field names from [3] into "extract FWSEC from BIOS". - Fixed erroneous use of `ERANGE` error. - Reworked `register!()` macro towards a more intuitive syntax, moved its helper macros into internal rules to avoid polluting the macro namespace. - Renamed all registers to capital snake case to better match OpenRM. - Removed declarations for registers that are not used yet. - Added more documentation for items not covered by Joel's documentation patches. - Removed timer device and replaced it with a helper function using `Ktime`. This also made [4] unneeded so it is dropped. - Unregister the sysmem flush page upon device destruction. - ... probably more that I forgot. >_< - Link to v1: https://lore.kernel.org/r/20250420-nova-frts-v1-0-ecd1cca23...@nvidia.com [3] https://lore.kernel.org/all/20250423225405.139613-6-joelagn...@nvidia.com/ [4] https://lore.kernel.org/lkml/20250420-nova-frts-v1-1-ecd1cca23...@nvidia.com/ --- Alexandre Courbot (21): rust: dma: fix comment rust: dma: expose the count and size of CoherentAllocation rust: dma: add dma_handle_with_offset method to CoherentAllocation rust: make ETIMEDOUT error available rust: sizes: add constants up to SZ_2G gpu: nova-core: use absolute paths in register!() macro gpu: nova-core: add delimiter for helper rules in register!() macro gpu: nova-core: expose the offset of each register as a type constant gpu: nova-core: allow register aliases gpu: nova-core: increase BAR0 size to 16MB gpu: nova-core: add helper function to wait on condition gpu: nova-core: wait for GFW_BOOT completion gpu: nova-core: add DMA object struct gpu: nova-core: register sysmem flush page gpu: nova-core: add falcon register definitions and base code gpu: nova-core: firmware: add ucode descriptor used by FWSEC-FRTS gpu: nova-core: compute layout of the FRTS region gpu: nova-core: add types for patching firmware binaries gpu: nova-core: extract FWSEC from BIOS and patch it to run FWSEC-FRTS gpu: nova-core: load and run FWSEC-FRTS gpu: nova-core: update and annotate TODO list Joel Fernandes (3): gpu: nova-core: vbios: Add base support for VBIOS construction and iteration gpu: nova-core: vbios: Add support to look up PMU table in FWSEC gpu: nova-core: vbios: Add support for FWSEC ucode extraction Documentation/gpu/nova/core/todo.rst | 107 +-- drivers/gpu/nova-core/dma.rs | 58 ++ drivers/gpu/nova-core/driver.rs | 6 +- drivers/gpu/nova-core/falcon.rs | 554 ++++++++++++++ drivers/gpu/nova-core/falcon/gsp.rs | 24 + drivers/gpu/nova-core/falcon/hal.rs | 54 ++ drivers/gpu/nova-core/falcon/hal/ga102.rs | 119 +++ drivers/gpu/nova-core/falcon/sec2.rs | 10 + drivers/gpu/nova-core/fb.rs | 136 ++++ drivers/gpu/nova-core/fb/hal.rs | 39 + drivers/gpu/nova-core/fb/hal/ga100.rs | 57 ++ drivers/gpu/nova-core/fb/hal/ga102.rs | 36 + drivers/gpu/nova-core/fb/hal/tu102.rs | 58 ++ drivers/gpu/nova-core/firmware.rs | 108 +++ drivers/gpu/nova-core/firmware/fwsec.rs | 423 +++++++++++ drivers/gpu/nova-core/gfw.rs | 41 + drivers/gpu/nova-core/gpu.rs | 132 +++- drivers/gpu/nova-core/nova_core.rs | 5 + drivers/gpu/nova-core/regs.rs | 288 +++++++ drivers/gpu/nova-core/regs/macros.rs | 65 +- drivers/gpu/nova-core/util.rs | 28 + drivers/gpu/nova-core/vbios.rs | 1157 +++++++++++++++++++++++++++++ rust/kernel/dma.rs | 48 +- rust/kernel/error.rs | 1 + rust/kernel/sizes.rs | 24 + 25 files changed, 3504 insertions(+), 74 deletions(-) --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250417-nova-frts-96ef299abe2c Best regards, -- Alexandre Courbot <acour...@nvidia.com>