On 6/9/26 18:15, Breno Leitao wrote: > On Tue, Jun 09, 2026 at 04:41:01PM +0200, David Hildenbrand (Arm) wrote: >> On 6/9/26 12:56, Breno Leitao wrote: >>> get_any_page() collapses every HWPoisonHandlable() rejection into a >>> single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page() >>> -> retry path. That is correct for the transient case (a userspace >>> folio briefly off LRU during migration or compaction, which a later >>> shake can drag back), but wrong for stable kernel-owned pages: slab, >>> page-table, large-kmalloc and PG_reserved pages will never become >>> HWPoisonHandlable(), so the retry loop is wasted work and the final >>> -EIO loses the "this is structurally unrecoverable" information. >>> memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the >>> panic-on-unrecoverable sysctl deliberately does not act on. >>> >>> Introduce HWPoisonKernelOwned(), a small predicate that positively >>> identifies pages the hwpoison handler cannot recover from: >>> >>> HWPoisonKernelOwned(p, flags) := >>> !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) && >>> (PageReserved(p) || >>> PageSlab(head) || PageTable(head) || PageLargeKmalloc(head)) >>> >>> where head = compound_head(p). >>> >>> PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the >>> page directly. The slab, page-table and large-kmalloc page-type bits >>> are only stored on the head page, so those tests resolve the compound >>> head first, then re-read compound_head(page) afterwards: a concurrent >>> split or compound free that moves head invalidates the just-read flags >>> and the loop retries. The lookup still takes no refcount, mirroring >>> the rest of get_any_page(); the recheck closes the common split race, >>> and a residual free->alloc->free in the same window can only mis-tag >>> a genuinely poisoned page, never reclassify a handlable one. >>> >>> The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the >>> same exception in HWPoisonHandlable(): soft-offline is allowed to >>> migrate movable_ops pages even though they are not on the LRU, and >>> we must not pre-empt that with an unrecoverable verdict. >>> >>> The list is intentionally not exhaustive. vmalloc and kernel-stack >>> pages, for example, do not carry a page_type bit and would need a >>> different oracle; they keep going through the existing retry path >>> unchanged. This is the smallest set we can identify with certainty >>> by page type. >>> >>> Wire the helper into the top of get_any_page() to short-circuit >>> those pages before the retry loop runs. On a hit, drop the caller's >>> MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE >>> straight away. Pages outside the helper's positive list still take >>> the existing retry path and return -EIO, leaving operator-visible >>> behaviour for those cases unchanged. >>> >>> Extend the unhandlable-page pr_err() to fire for either errno and >>> update the get_hwpoison_page() kerneldoc to document the new return. >>> >>> memory_failure() still folds every negative return into >>> MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so >>> this patch on its own only changes the errno that soft_offline_page() >>> can propagate to its callers. A follow-up wires -ENOTRECOVERABLE >>> through memory_failure() and reports MF_MSG_KERNEL for the >>> unrecoverable cases, which is what the >>> panic_on_unrecoverable_memory_failure sysctl observes. >>> >>> Suggested-by: David Hildenbrand <[email protected]> >>> Suggested-by: Lance Yang <[email protected]> >>> Signed-off-by: Breno Leitao <[email protected]> >>> --- >>> mm/memory-failure.c | 60 >>> +++++++++++++++++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 58 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>> index f4d3e6e20e13..eed9de387694 100644 >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -1325,6 +1325,46 @@ static inline bool HWPoisonHandlable(struct page >>> *page, unsigned long flags) >>> return PageLRU(page) || is_free_buddy_page(page); >>> } >>> >>> +/* >>> + * Positive identification of pages the hwpoison handler cannot recover. >>> + * These page types are owned by kernel internals (no userspace mapping >>> + * to unmap, no file mapping to invalidate, no migration target), so the >>> + * shake_page() / retry loop in get_any_page() can never turn them into >>> + * something HWPoisonHandlable() will accept. Short-circuit them to >>> + * -ENOTRECOVERABLE so callers can panic on operator request instead of >>> + * spinning through retries that exit as a transient-looking -EIO. >>> + * >>> + * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors >>> + * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops >>> + * pages even though they are not on the LRU. >>> + */ >>> +static inline bool HWPoisonKernelOwned(struct page *page, unsigned long >>> flags) >>> +{ >>> + struct page *head; >>> + >>> + if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page)) >>> + return false; >>> + >> >> On a second look: Do we really need that? The page types below never support >> migration. So I guess that check is not required? >> >> Apart from that, looks good with two comments: >> >> a) HWPoisonKernelOwned: this is not the common style for us to name >> functions. >> >> is_kernel_owned_page() or sth like that would do. > > Ack, I will rename it is_kernel_owned_page() > > In my defence, most of the functions similar to HWPoisonKernelOwned() > has this name format, and I got this discussion earlier (with Lance? > I think). Here are the similar function names in that file: > > * HWPoisonHandlable > * PageHWPoisonTakenOff() > * SetPageHWPoisonTakenOff
Some of these probably date back to our old way of handling page flags and things, like PageLRU. But we really should stop :) > > I will update in the new version. Thanks! Probably best to wait a bit, the merge window is coming up either way, so this will have to wait a bit either way. -- Cheers, David
