On Tue, Jun 02, 2026 at 03:05:32PM +0800, Miaohe Lin wrote:
> On 2026/5/27 22:06, Breno Leitao wrote:
> > Add a sysctl panic_on_unrecoverable_memory_failure (disabled by
> > default) that triggers a kernel panic when memory_failure()
> > encounters pages that cannot be recovered. This provides a clean
> > crash with useful debug information rather than allowing silent
> > data corruption or a delayed crash at an unrelated code path.
> >
> > Panic eligibility is intentionally narrow: only MF_MSG_KERNEL with
> > result == MF_IGNORED panics. After the previous patch, MF_MSG_KERNEL
> > covers PG_reserved pages and the kernel-owned pages promoted from
> > get_hwpoison_page() via -ENOTRECOVERABLE (slab, page tables,
> > large-kmalloc).
> >
> > All other action types are excluded:
> >
> > - MF_MSG_GET_HWPOISON and MF_MSG_KERNEL_HIGH_ORDER can be reached by
> > transient refcount races with the page allocator (an in-flight buddy
> > allocation has refcount 0 and is no longer on the buddy free list,
> > briefly), and panicking on them would risk killing the box for what
> > is actually a recoverable userspace page.
> >
> > - MF_MSG_UNKNOWN means identify_page_state() could not classify the
> > page; that is precisely the wrong basis for a panic decision.
> >
> > Signed-off-by: Breno Leitao <[email protected]>
> > ---
> > mm/memory-failure.c | 23 +++++++++++++++++++++++
> > 1 file changed, 23 insertions(+)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 14c0a958638c..dcd53dbc6aec 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly =
> > 1;
> >
> > static int sysctl_enable_soft_offline __read_mostly = 1;
> >
> > +static int sysctl_panic_on_unrecoverable_mf __read_mostly;
> > +
> > atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
> >
> > static bool hw_memory_failure __read_mostly = false;
> > @@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] =
> > {
> > .proc_handler = proc_dointvec_minmax,
> > .extra1 = SYSCTL_ZERO,
> > .extra2 = SYSCTL_ONE,
> > + },
> > + {
> > + .procname = "panic_on_unrecoverable_memory_failure",
> > + .data = &sysctl_panic_on_unrecoverable_mf,
> > + .maxlen = sizeof(sysctl_panic_on_unrecoverable_mf),
> > + .mode = 0644,
> > + .proc_handler = proc_dointvec_minmax,
> > + .extra1 = SYSCTL_ZERO,
> > + .extra2 = SYSCTL_ONE,
> > }
> > };
> >
> > @@ -1255,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long
> > pfn,
> > ++mf_stats->total;
> > }
> >
> > +static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
> > + enum mf_result result)
> > +{
> > + if (!sysctl_panic_on_unrecoverable_mf || result != MF_IGNORED)
> > + return false;
> > +
> > + return type == MF_MSG_KERNEL;
>
> Would it be more straightforward to write as something like:
>
> if (!sysctl_panic_on_unrecoverable_mf)
> return false;
>
> return (type == MF_MSG_KERNEL && result == MF_IGNORED);
Sure, that reads better. I'll fold the MF_IGNORED check into the return for
the next revision.
static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
enum mf_result result)
{
if (!sysctl_panic_on_unrecoverable_mf)
return false;
return type == MF_MSG_KERNEL && result == MF_IGNORED;
}