To: [email protected]

Hello,

I am providing a detailed root cause analysis for this bug based on
direct inspection of the 550.163.01 source on a Debian trixie system.

== ROOT CAUSE ==

The problem is in conftest.sh, in the vm_area_struct_has_const_vm_flags
detection block (line 6529).

The conftest detects NV_VM_AREA_STRUCT_HAS_CONST_VM_FLAGS by compiling:

    #include <linux/mm_types.h>
    int conftest_vm_area_struct_has_const_vm_flags(void) {
        return offsetof(struct vm_area_struct, __vm_flags);
    }

In kernel 6.19, __vm_flags was removed from the vm_area_struct union,
so this compile test fails and NV_VM_AREA_STRUCT_HAS_CONST_VM_FLAGS
is left undefined.

With that macro undefined, nv-mm.h evaluates nv_vm_flags_set() and
nv_vm_flags_clear() as follows:

    static inline void nv_vm_flags_set(struct vm_area_struct *vma,
                                       vm_flags_t flags)
    {
    #if !NV_CAN_CALL_VMA_START_WRITE
        nv_vma_start_write(vma);
        ACCESS_PRIVATE(vma, __vm_flags) |= flags;   /* FAILS on 6.19 */
    #elif defined(NV_VM_AREA_STRUCT_HAS_CONST_VM_FLAGS)
        vm_flags_set(vma, flags);                   /* correct for 6.19 */
    #else
        vma->vm_flags |= flags;
    #endif
    }

NV_CAN_CALL_VMA_START_WRITE is defined in nv-mm.h based on whether
NV_IS_EXPORT_SYMBOL_GPL___vma_start_write is set. On Debian kernels
__vma_start_write is GPL-only, so NV_IS_EXPORT_SYMBOL_GPL___vma_start_write
is never set, NV_CAN_CALL_VMA_START_WRITE is always 0, and the first
branch is always taken — hitting the now-missing __vm_flags field.

The correct branch for kernel 6.19 is the second one (vm_flags_set),
but it is never reached because NV_VM_AREA_STRUCT_HAS_CONST_VM_FLAGS
is undefined.

== FIX ==

Add a second compile test in the same case block in conftest.sh,
immediately after the existing one and before the ;; terminator.
The second test checks for vm_flags_set(). If it compiles, it defines
NV_VM_AREA_STRUCT_HAS_CONST_VM_FLAGS to 1, overriding the previous
#undef. This is safe because append_conftest() writes sequentially
to stdout and the C preprocessor uses the last definition.

--- a/conftest.sh
+++ b/conftest.sh
@@ -6542,6 +6542,13 @@ vm_area_struct_has_const_vm_flags)
             }"
             compile_check_conftest "$CODE" \
                 "NV_VM_AREA_STRUCT_HAS_CONST_VM_FLAGS" "" "types"
+            CODE="
+            #include <linux/mm_types.h>
+            #include <linux/mm.h>
+            void conftest_vm_flags_set_exists(struct vm_area_struct *vma) {
+                vm_flags_set(vma, (vm_flags_t)0);
+            }"
+            compile_check_conftest "$CODE" \
+                "NV_VM_AREA_STRUCT_HAS_CONST_VM_FLAGS" "1" "types"
         ;;

== AFFECTED ==

nvidia-kernel-dkms 550.163.01-4 on Linux 6.19.6+deb14-amd64.
Linux <= 6.18 unaffected.

== NOTE ==

The 550 branch is required for Maxwell (GTX 900) and Pascal (GTX 1000)
GPUs which are not supported by the 575+ driver.

Regards

Reply via email to