TestSetPageHWPoison() is called without zone->lock, so its atomic
update to page->flags can race with non-atomic flag operations
that run under zone->lock in the buddy allocator.

In particular, __free_pages_prepare() does:

    page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;

This non-atomic read-modify-write, while correctly excluding
__PG_HWPOISON from the mask, can still lose a concurrent
TestSetPageHWPoison if the read happens before the poison bit
is set and the write happens after.  Follow-up patches in this
series add similar non-atomic flag operations as well.

Fix by acquiring zone->lock around TestSetPageHWPoison.  This
serializes with all buddy flag manipulation.  The cost is
negligible: one lock/unlock in an extremely rare path
(hardware memory errors).

Signed-off-by: Michael S. Tsirkin <[email protected]>
Assisted-by: Claude:claude-opus-4-6
---
 mm/memory-failure.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..a6b61172dd13 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2348,6 +2348,8 @@ int memory_failure(unsigned long pfn, int flags)
        unsigned long page_flags;
        bool retry = true;
        int hugetlb = 0;
+       struct zone *zone;
+       unsigned long mf_flags;
 
        if (!sysctl_memory_failure_recovery)
                panic("Memory failure on page %lx", pfn);
@@ -2390,7 +2392,10 @@ int memory_failure(unsigned long pfn, int flags)
        if (hugetlb)
                goto unlock_mutex;
 
+       zone = page_zone(p);
+       spin_lock_irqsave(&zone->lock, mf_flags);
        if (TestSetPageHWPoison(p)) {
+               spin_unlock_irqrestore(&zone->lock, mf_flags);
                res = -EHWPOISON;
                if (flags & MF_ACTION_REQUIRED)
                        res = kill_accessing_process(current, pfn, flags);
@@ -2399,6 +2404,7 @@ int memory_failure(unsigned long pfn, int flags)
                action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
                goto unlock_mutex;
        }
+       spin_unlock_irqrestore(&zone->lock, mf_flags);
 
        /*
         * We need/can do nothing about count=0 pages.
-- 
MST


Reply via email to