Hi John,
On 11/11/2025 11:30 PM, John Hubbard wrote:
> NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain
> architecture and revision details, and will instead use NV_PMC_BOOT_42
> in the future. NV_PMC_BOOT_0 will contain a specific set of values
> that will mean "go read NV_PMC_BOOT_42 instead".
>
> Change the selection logic in Nova so that it will claim Turing and
> later GPUs. This will work for the foreseeable future, without any
> further code changes here, because all NVIDIA GPUs are considered, from
> the oldest supported on Linux (NV04), through the future GPUs.
[...]
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index cd58040b681b..8c5f46f6aaac 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -175,19 +175,41 @@ pub(crate) struct Spec {
>
> impl Spec {
> fn new(bar: &Bar0) -> Result<Spec> {
> + // Some brief notes about boot0 and boot42, in chronological order:
> + //
> + // NV04 through NV50:
> + //
> + // Not supported by Nova. boot0 is necessary and sufficient to
> identify these GPUs.
> + // boot42 may not even exist on some of these GPUs.
> + //
> + // Fermi through Volta:
> + //
> + // Not supported by Nova. boot0 is still sufficient to identify
> these GPUs, but boot42
> + // is also guaranteed to be both present and accurate.
> + //
> + // Turing and later:
> + //
> + // Supported by Nova. Identified by first checking boot0 to
> ensure that the GPU is not
> + // from an earlier (pre-Fermi) era, and then using boot42 to
> precisely identify the GPU.
> + // Somewhere in the Rubin timeframe, boot0 will no longer have
> space to add new GPU IDs.
> +
> let boot0 = regs::NV_PMC_BOOT_0::read(bar);
>
> - Spec::try_from(boot0)
> + if boot0.is_older_than_fermi() {
> + return Err(ENOTSUPP);
> + }
> +
> + Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
There is an inconsistency in error return here, if NV04 through NV50, it returns
-ENOTSUPP. For Fermi through Volta, it will read boot42 but will return -ENODEV
because `Spec::try_from()` -> `boot42.chipset()` with return -ENODEV. I am Ok
with either error return, but it would be good to make it consistent.
There also does not seem to be a diagnostic if the chipset is not supported. It
would be good diagnostic that the chipset did not match, right now it will
return -ENODEV, which could mean the device does not exist. -ENOTSUPP is better
though but an actual dmesg error message would be nice.
With these,
Reviewed-by: Joel Fernandes <[email protected]>
Thanks.