TestSetPageHWPoison() is called without zone->lock, so its atomic
update to page->flags can race with non-atomic flag operations
that run under zone->lock in the buddy allocator.
In particular, __free_pages_prepare() does:
page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
This non-atomic read-modify-write, while correctly excluding
__PG_HWPOISON from the mask, can still lose a concurrent
TestSetPageHWPoison if the read happens before the poison bit
is set and the write happens after. Follow-up patches in this
series add similar non-atomic flag operations as well.
Fix by acquiring zone->lock around TestSetPageHWPoison. This
serializes with all buddy flag manipulation. The cost is
negligible: one lock/unlock in an extremely rare path
(hardware memory errors).
Signed-off-by: Michael S. Tsirkin <[email protected]>
Assisted-by: Claude:claude-opus-4-6
---
mm/memory-failure.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d4361309..a6b61172dd13 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2348,6 +2348,8 @@ int memory_failure(unsigned long pfn, int flags)
unsigned long page_flags;
bool retry = true;
int hugetlb = 0;
+ struct zone *zone;
+ unsigned long mf_flags;
if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
@@ -2390,7 +2392,10 @@ int memory_failure(unsigned long pfn, int flags)
if (hugetlb)
goto unlock_mutex;
+ zone = page_zone(p);
+ spin_lock_irqsave(&zone->lock, mf_flags);
if (TestSetPageHWPoison(p)) {
+ spin_unlock_irqrestore(&zone->lock, mf_flags);
res = -EHWPOISON;
if (flags & MF_ACTION_REQUIRED)
res = kill_accessing_process(current, pfn, flags);
@@ -2399,6 +2404,7 @@ int memory_failure(unsigned long pfn, int flags)
action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
goto unlock_mutex;
}
+ spin_unlock_irqrestore(&zone->lock, mf_flags);
/*
* We need/can do nothing about count=0 pages.
--
MST