Hi, I would like to make sure this AMDKFD SVM regression is tracked by the Linux regression process.
GitLab report: https://gitlab.freedesktop.org/drm/amd/-/work_items/4914 The regression was originally reported on 2026-01-27. It was bisected to the same functional change that Alex Deucher's revert patch later targeted: 448ee45353ef9fb1a34f5f26eb3f48923c6f0898 drm/amdkfd: Use huge page size to check split svm range alignment The affected kernel line I tested identifies the same change as: bf2084a7b1d75d093b6a79df4c10142d49fbaa0e Alex's revert patch: https://lists.freedesktop.org/archives/amd-gfx/2026-February/138824.html A small C/HSA reproducer is now available in the GitLab report. It does not require PyTorch, ComfyUI, Docker, model files, or the original workload. It uses ROCr/HSA, an anonymous THP-advised host mapping, explicit KFD SVM SET_ATTR ioctls, and an HSA SDMA D2H copy. Single reproducer command, same binary on both kernels: ./kfd_svm_split_hsa_copy --upstream-ab Same-machine A/B result on an RX 7600 XT: 448ee453/bf2084a7 active: 1/1 run faults with SDMA0 permission fault GCVM_L2_PROTECTION_FAULT_STATUS=0x00841A51 448ee453/bf2084a7 locally reverted: 10/10 runs complete no ROCr memory access fault no new GCVM/SDMA0 permission fault in dmesg The bad fault page is inside the split tail and inside the SDMA copy range: critical tail: [0x722429d61..0x722429dff] copy pages: [0x722429b30..0x722429d70] fault page: 0x722429d65 A full ftrace/PTE run with the same C reproducer/SVM sequence also shows: split_tail ... current_remap=0 old_remap=1 missed=1 MISSED_REMAP_CANDIDATE split=tail no amdgpu_vm_update_ptes covering the fault page after the marker before the fault-side GET_ATTRThe suspected code issue is that the split-tail/head remap predicate introduced
by 448ee453/bf2084a7 can miss tails inside the final 512-page block. Since prange->last is inclusive, ALIGN_DOWN(prange->last, 512) is the start of the final block, not an exclusive upper bound. I also sent a short follow-up to amd-gfx with the reproducer/A-B summary andasked what original failure or workload 448ee453/bf2084a7 was intended to fix:
https://lists.freedesktop.org/archives/amd-gfx/2026-June/145800.htmlI can resend the reproducer source and summaries directly on-list if preferred.
#regzbot introduced: 448ee45353ef9fb1a34f5f26eb3f48923c6f0898 #regzbot monitor: https://gitlab.freedesktop.org/drm/amd/-/work_items/4914 Thanks, Gerhard Schwanzer
publickey - [email protected] - 0xE32DB141.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
