https://bugzilla.kernel.org/show_bug.cgi?id=221012

            Bug ID: 221012
           Summary: GPU page fault on AMD RX 7600 XT after commit
                    bf2084a7b1d75d093b6a79df4c10142d49fbaa0e
           Product: Drivers
           Version: 2.5
          Hardware: AMD
                OS: Linux
            Status: NEW
          Severity: high
          Priority: P3
         Component: Video(DRI - non Intel)
          Assignee: [email protected]
          Reporter: [email protected]
        Regression: No

Created attachment 309237
  --> https://bugzilla.kernel.org/attachment.cgi?id=309237&action=edit
dmesg, lspci

GPU page fault occurs when running HIP/ROCm workloads on the AMD Radeon RX 7600
XT with kernel version 6.18.2. This issue is introduced by commit
bf2084a7b1d75d093b6a79df4c10142d49fbaa0e, which modifies the alignment logic
for split SVM ranges and introduces the use of huge pages. The error can be
resolved by reverting this commit.

The fault is reproducible under consistent workloads, and it prevents
successful execution of HIP-based tasks, such as PyTorch model training, on
this GPU. The error manifests as permission faults in the GPU driver logs.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Reply via email to