This fixes a critical mmap_sem lock leak that leads to deadlock when a
page fault with FAULT_FLAG_RETRY_NOWAIT encounters a fatal signal.

The Bug Scenario:
=================

When get_user_pages() is called with FOLL_NOWAIT flag (e.g., from
direct I/O or certain network operations), it sets both
FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT.

The page fault path calls __lock_page_or_retry() which has special
handling for FAULT_FLAG_RETRY_NOWAIT (mm/filemap.c:1025-1026):

    if (flags & FAULT_FLAG_RETRY_NOWAIT)
        return 0;  /* Lock NOT released! */

This returns 0 (page not locked) but mmap_sem is STILL HELD, as
documented by the explicit "CAUTION" comment at line 1022.

The fault handler then sets VM_FAULT_RETRY and returns. When
__do_page_fault() sees VM_FAULT_RETRY with a fatal signal pending,
it incorrectly assumes the lock was already released (line 1318
comment: "the mmap_sem has already been released") and returns to
userspace or calls no_context() without releasing the lock.

Result: Process continues with mmap_sem held, leading to deadlock on
the next page fault or mmap_sem operation.

The Correct Pattern:
====================

The proper handling is documented in fs/userfaultfd.c:331-333:

    "If FAULT_FLAG_ALLOW_RETRY is set, the mmap_sem must be released
     before returning VM_FAULT_RETRY only if FAULT_FLAG_RETRY_NOWAIT
     is not set."

And implemented at lines 444-483:

    ret = VM_FAULT_RETRY;
    if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
        goto out;  /* Return WITHOUT releasing lock */

    /* ... */
    up_read(&mm->mmap_sem);  /* Release only in non-NOWAIT case */

The Fix:
========

Before returning to userspace or calling no_context(), explicitly
check if FAULT_FLAG_RETRY_NOWAIT was set. If yes, release mmap_sem
that was left held by __lock_page_or_retry().

The release is needed in both paths:
 - Userspace: Cannot return with kernel lock held
 - no_context(): May return via fixup_exception() for kernel faults
   (e.g., copy_from_user), and cannot continue with lock held

This matches the semantic contract documented in userfaultfd.c.

Observed Symptoms:
==================

Process deadlocks with stack trace showing:
 - Owns mmap_sem
 - Page fault from userspace (no syscall context)
 - Blocked in rwsem_down_read_failed trying to re-acquire mmap_sem

https://virtuozzo.atlassian.net/browse/PSBM-161478

Signed-off-by: Konstantin Khorenko <[email protected]>
---
 arch/x86/mm/fault.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ebbedfd975730..0204ff4140ded 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1328,6 +1328,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long 
error_code,
                                goto retry;
                }
 
+               /*
+                * FAULT_FLAG_RETRY_NOWAIT means the lock was NOT released
+                * by __lock_page_or_retry(). We must release it here before
+                * returning to either userspace or calling no_context().
+                */
+               if (flags & FAULT_FLAG_RETRY_NOWAIT)
+                       up_read(&mm->mmap_sem);
+
                /* User mode? Just return to handle the fatal exception */
                if (flags & FAULT_FLAG_USER)
                        return;
-- 
2.43.0

_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to