On 1/4/26 22:17, Mikulas Patocka wrote:
> If a process sets up a timer that periodically sends a signal in short
> intervals and if it executes some kernel code that calls
> mm_take_all_locks, we get random -EINTR failures.
> 
> The function mm_take_all_locks fails with -EINTR if there is pending
> signal. The -EINTR is propagated up the call stack to userspace and
> userspace fails if it gets this error.
> 
> In order to fix these failures, this commit changes
> signal_pending(current) to fatal_signal_pending(current) in
> mm_take_all_locks, so that it is interrupted only if the signal is
> actually killing the process.
> 
> For example, this bug happens when using OpenCL on AMDGPU. Sometimes,
> probing the OpenCL device fails (strace shows that open("/dev/kfd")
> failed with -EINTR). Sometimes we get the message "amdgpu:
> init_user_pages: Failed to register MMU notifier: -4" in the syslog.
> 
> The bug can be reproduced with the following program.
> 
> To run this program, you need AMD graphics card and the package
> "rocm-opencl" installed. You must not have the package "mesa-opencl-icd"
> installed, because it redirects the default OpenCL implementation to
> itself.
> 
> include <stdio.h>
> include <stdlib.h>
> include <unistd.h>
> include <string.h>
> include <signal.h>
> include <sys/time.h>
> 
> define CL_TARGET_OPENCL_VERSION       300
> include <CL/opencl.h>
> 
> static void fn(void)
> {
>       while (1) {
>               int32_t err;
>               cl_device_id device;
>               err = clGetDeviceIDs(NULL, CL_DEVICE_TYPE_GPU, 1, &device, 
> NULL);
>               if (err != CL_SUCCESS) {
>                       fprintf(stderr, "clGetDeviceIDs failed: %d\n", err);
>                       exit(1);
>               }
>               write(2, "-", 1);
>       }
> }
> 
> static void alrm(int sig)
> {
>       write(2, ".", 1);
> }
> 
> int main(void)
> {
>       struct itimerval it;
>       struct sigaction sa;
>       memset(&sa, 0, sizeof sa);
>       sa.sa_handler = alrm;
>       sa.sa_flags = SA_RESTART;
>       sigaction(SIGALRM, &sa, NULL);
>       it.it_interval.tv_sec = 0;
>       it.it_interval.tv_usec = 50;
>       it.it_value.tv_sec = 0;
>       it.it_value.tv_usec = 50;
>       setitimer(ITIMER_REAL, &it, NULL);
>       fn();
>       return 1;
> }
> 
> I'm submitting this patch for the stable kernels, because this bug may
> cause random failures in any code that calls mm_take_all_locks.
> 
> Signed-off-by: Mikulas Patocka <[email protected]>
> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-November/133141.html
> Link: 
> https://yhbt.net/lore/linux-mm/[email protected]/T/#u
> Cc: [email protected]
> Fixes: 7906d00cd1f6 ("mmu-notifiers: add mm_take_all_locks() operation")

Acked-by: Vlastimil Babka <[email protected]>

This makes sense to me as a backportable bugfix. But I wonder if going
forward we should rather make all that locking killable instead of the
hopeful checks between individual lock attempts.

> 
> ---
>  mm/vma.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: mm/mm/vma.c
> ===================================================================
> --- mm.orig/mm/vma.c  2026-01-04 21:19:13.000000000 +0100
> +++ mm/mm/vma.c       2026-01-04 21:19:13.000000000 +0100
> @@ -2166,14 +2166,14 @@ int mm_take_all_locks(struct mm_struct *
>        * is reached.
>        */
>       for_each_vma(vmi, vma) {
> -             if (signal_pending(current))
> +             if (fatal_signal_pending(current))
>                       goto out_unlock;
>               vma_start_write(vma);

E.g. here I think we already added a killable variant recently?

>       }
>  
>       vma_iter_init(&vmi, mm, 0);
>       for_each_vma(vmi, vma) {
> -             if (signal_pending(current))
> +             if (fatal_signal_pending(current))
>                       goto out_unlock;
>               if (vma->vm_file && vma->vm_file->f_mapping &&
>                               is_vm_hugetlb_page(vma))
> @@ -2182,7 +2182,7 @@ int mm_take_all_locks(struct mm_struct *
>  
>       vma_iter_init(&vmi, mm, 0);
>       for_each_vma(vmi, vma) {
> -             if (signal_pending(current))
> +             if (fatal_signal_pending(current))
>                       goto out_unlock;
>               if (vma->vm_file && vma->vm_file->f_mapping &&
>                               !is_vm_hugetlb_page(vma))
> @@ -2191,7 +2191,7 @@ int mm_take_all_locks(struct mm_struct *
>  
>       vma_iter_init(&vmi, mm, 0);
>       for_each_vma(vmi, vma) {
> -             if (signal_pending(current))
> +             if (fatal_signal_pending(current))
>                       goto out_unlock;
>               if (vma->anon_vma)
>                       list_for_each_entry(avc, &vma->anon_vma_chain, same_vma)
> 

Reply via email to