nvidia: Stop using 32-bit MSR interfaces

sashiko-bot Sun, 28 Jun 2026 23:25:42 -0700

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) 
to consider:


Pre-existing issues:
- [High] This problem wasn't introduced by this patch, but there is a memory 
leak of the ioremap virtual memory area on suspend/resume.
- [High] This problem wasn't introduced by this patch, but an ignored 
initialization error on resume leads to a guaranteed NULL pointer dereference 
in the hardware TLB flush path.
- [High] This problem wasn't introduced by this patch, but a missing break in a 
timeout loop causes a soft lockup if the hardware fails to complete a TLB flush.
--

commit 809ee345f7a00dcaa6e844219a8b666a3f9ffadb
Author: Juergen Gross <[email protected]>

agp/nvidia: Stop using 32-bit MSR interfaces

This commit replaces the deprecated 32-bit MSR interfaces rdmsr() and wrmsr()
with their 64-bit equivalents in the NVIDIA AGP driver. This is part of a
planned removal of the 32-bit variants.

> diff --git a/drivers/char/agp/nvidia-agp.c b/drivers/char/agp/nvidia-agp.c
> index 4787391bb6b4f..3e760bc00afaa 100644
> --- a/drivers/char/agp/nvidia-agp.c
> +++ b/drivers/char/agp/nvidia-agp.c

[ ... ]

[Severity: High]
This isn't a bug introduced by this patch, but I noticed a potential memory
leak on suspend/resume.

Does this code leak the vmap area leading to vmalloc space exhaustion after
repeated suspend/resume cycles?

drivers/char/agp/nvidia-agp.c:nvidia_configure() {
...
        /* map aperture */
        apbase_phys = pci_resource_start(agp_bridge->dev, AGP_APERTURE_BAR);
        nvidia_private.aperture =
                (volatile u32 __iomem *) ioremap(apbase_phys, 33 * PAGE_SIZE);
...
}

Because the driver defines no suspend hook:

drivers/char/agp/nvidia-agp.c:
...
static DEFINE_SIMPLE_DEV_PM_OPS(agp_nvidia_pm_ops, NULL, agp_nvidia_resume);
...

The previous ioremap allocation is never unmapped via iounmap() when the
system suspends.

[Severity: High]
This is a pre-existing issue, but I spotted a potential NULL pointer
dereference in the hardware TLB flush path.

Could a failure in nvidia_configure() during resume lead to a kernel crash?

drivers/char/agp/nvidia-agp.c:agp_nvidia_resume() {
...
        /* reconfigure AGP hardware again */
        nvidia_configure();

        return 0;
}

The return value of nvidia_configure() is ignored. If ioremap fails during
resume, nvidia_configure() returns -ENOMEM and nvidia_private.aperture
remains NULL. 

Later, when nvidia_tlbflush() is called, it unconditionally dereferences
nvidia_private.aperture:

drivers/char/agp/nvidia-agp.c:nvidia_tlbflush() {
...
        /* flush TLB entries */
        for (i = 0; i < 32 + 1; i++)
                temp = readl(nvidia_private.aperture+(i * PAGE_SIZE / 
sizeof(u32)));
...
}

[Severity: High]
This isn't a bug introduced by this patch, but the timeout loop in
nvidia_tlbflush() appears to be missing a break statement.

Can this cause a soft lockup if the hardware fails to complete a TLB flush?

drivers/char/agp/nvidia-agp.c:nvidia_tlbflush() {
...
                end = jiffies + 3*HZ;
                do {
                        pci_read_config_dword(nvidia_private.dev_1,
                                        NVIDIA_1_WBC, &wbc_reg);
                        if (time_before_eq(end, jiffies)) {
                                printk(KERN_ERR PFX
                                    "TLB flush took more than 3 seconds.\n");
                        }
                } while (wbc_reg & nvidia_private.wbc_mask);
...
}

If the hardware enters an unexpected state and the timeout expires, the code
prints an error message but lacks a break statement to exit the loop, which
could trap the kernel in an infinite loop.

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=21

Re: [PATCH 21/32] agp/nvidia: Stop using 32-bit MSR interfaces

Reply via email to