On 23.10.24 08:24, Dmitry Vyukov wrote:
Hi Florian, Lorenzo,
This looks great!
What I am VERY interested in is if poisoned pages cause SIGSEGV even when
the access happens in the kernel. Namely, the syscall still returns EFAULT,
but also SIGSEGV is queued on return to user-space.
Catching bad accesses in system calls is currently the weak spot for
all user-space bug detection tools (GWP-ASan, libefence, libefency, etc).
It's almost possible with userfaultfd, but catching faults in the kernel
requires admin capability, so not really an option for generic bug
detection tools (+inconvinience of userfaultfd setup/handler).
Intercepting all EFAULT from syscalls is not generally possible
(w/o ptrace, usually not an option as well), and EFAULT does not always
mean a bug.
Triggering SIGSEGV even in syscalls would be not just a performance
optimization, but a new useful capability that would allow it to catch
more bugs.
Right, we discussed that offline also as a possible extension to the
userfaultfd SIGBUS mode.
I did not look into that yet, but I was wonder if there could be cases
where a different process could trigger that SIGSEGV, and how to (and if
to) handle that.
For example, ptrace (access_remote_vm()) -> GUP likely can trigger that.
I think with userfaultfd() we will currently return -EFAULT, because we
call get_user_page_vma_remote() that is not prepared for dropping the
mmap lock. Possibly that is the right thing to do, but not sure :)
These "remote" faults set FOLL_REMOTE -> FAULT_FLAG_REMOTE, so we might
be able to distinguish them and perform different handling.
--
Cheers,
David / dhildenb