This fixes a critical mmap_sem lock leak that leads to deadlock when a
page fault with FAULT_FLAG_RETRY_NOWAIT encounters a fatal signal.
The Bug Scenario:
=================
When get_user_pages() is called with FOLL_NOWAIT flag (e.g., from
direct I/O or certain network operations), it sets both
FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT.
The page fault path calls __lock_page_or_retry() which has special
handling for FAULT_FLAG_RETRY_NOWAIT (mm/filemap.c:1025-1026):
if (flags & FAULT_FLAG_RETRY_NOWAIT)
return 0; /* Lock NOT released! */
This returns 0 (page not locked) but mmap_sem is STILL HELD, as
documented by the explicit "CAUTION" comment at line 1022.
The fault handler then sets VM_FAULT_RETRY and returns. When
__do_page_fault() sees VM_FAULT_RETRY with a fatal signal pending,
it incorrectly assumes the lock was already released (line 1318
comment: "the mmap_sem has already been released") and returns to
userspace or calls no_context() without releasing the lock.
Result: Process continues with mmap_sem held, leading to deadlock on
the next page fault or mmap_sem operation.
The Correct Pattern:
====================
The proper handling is documented in fs/userfaultfd.c:331-333:
"If FAULT_FLAG_ALLOW_RETRY is set, the mmap_sem must be released
before returning VM_FAULT_RETRY only if FAULT_FLAG_RETRY_NOWAIT
is not set."
And implemented at lines 444-483:
ret = VM_FAULT_RETRY;
if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
goto out; /* Return WITHOUT releasing lock */
/* ... */
up_read(&mm->mmap_sem); /* Release only in non-NOWAIT case */
The Fix:
========
Before returning to userspace or calling no_context(), explicitly
check if FAULT_FLAG_RETRY_NOWAIT was set. If yes, release mmap_sem
that was left held by __lock_page_or_retry().
The release is needed in both paths:
- Userspace: Cannot return with kernel lock held
- no_context(): May return via fixup_exception() for kernel faults
(e.g., copy_from_user), and cannot continue with lock held
This matches the semantic contract documented in userfaultfd.c.
Observed Symptoms:
==================
Process deadlocks with stack trace showing:
- Owns mmap_sem
- Page fault from userspace (no syscall context)
- Blocked in rwsem_down_read_failed trying to re-acquire mmap_sem
https://virtuozzo.atlassian.net/browse/PSBM-161478
Signed-off-by: Konstantin Khorenko <[email protected]>
---
arch/x86/mm/fault.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ebbedfd975730..0204ff4140ded 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1328,6 +1328,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long
error_code,
goto retry;
}
+ /*
+ * FAULT_FLAG_RETRY_NOWAIT means the lock was NOT released
+ * by __lock_page_or_retry(). We must release it here before
+ * returning to either userspace or calling no_context().
+ */
+ if (flags & FAULT_FLAG_RETRY_NOWAIT)
+ up_read(&mm->mmap_sem);
+
/* User mode? Just return to handle the fatal exception */
if (flags & FAULT_FLAG_USER)
return;
--
2.43.0
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel