On Mon, Jun 08, 2026 at 03:14:51PM +0100, Lorenzo Stoakes wrote:
> On Mon, Jun 08, 2026 at 09:48:34AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jun 08, 2026 at 10:43:21AM +0100, Lorenzo Stoakes wrote:
> > > On Mon, Jun 08, 2026 at 04:34:23AM -0400, Michael S. Tsirkin wrote:
> > > > TestSetPageHWPoison() is called without zone->lock, so its atomic
> > > > update to page->flags can race with non-atomic flag operations
> > > > that run under zone->lock in the buddy allocator.
> > > >
> > > > In particular, __free_pages_prepare() does:
> > > >
> > > >     page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
> > > >
> > > > This non-atomic read-modify-write, while correctly excluding
> > > > __PG_HWPOISON from the mask, can still lose a concurrent
> > > > TestSetPageHWPoison if the read happens before the poison bit
> > > > is set and the write happens after.  Follow-up patches in this
> > > > series add similar non-atomic flag operations as well.
> > > >
> > > > Fix by acquiring zone->lock around TestSetPageHWPoison and
> > > > around ClearPageHWPoison in the retry path.  This
> > > > serializes with all buddy flag manipulation.  The cost is
> > > > negligible: one lock/unlock in an extremely rare path
> > > > (hardware memory errors).
> > > >
> > > > Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
> > > > in this file operate on pages already removed from the buddy
> > > > allocator or on non-buddy pages (DAX, hugetlb), so they do not
> > > > need zone->lock protection.
> > > >
> > > > Acked-by: Miaohe Lin <[email protected]>
> > > > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > >
> > > Can we have Fixes: and Cc: stable and also send this separately please?
> > >
> > > These patches seem like unrelated fixups that you've discovered along the 
> > > way,
> > > and don't belong as part of the already rather large series, unless I'm 
> > > missing
> > > something here.
> > >
> > > Thanks, Lorenzo
> >
> > I think you are mising that they are a dependency, not unrelated.
> 
> Then say so.
> 
> > For example, this issue gets worse with the patchset as there are more
> > places that manipulate flags without atomics. No?
> 
> It's your job to make that case, not mine.
> 
> >
> >
> > You are welcome to send this to stable, but I think stable rules
> > preclude theoretical bugfixes.
> 
> It's a dependency but also theoretical?

As in, the race is exteremely hard to trigger and I have no idea if it
triggers for anyone, but it's obvious from reading the code that
theoretically it exists? Yes.

> >
> > As for Fixes: the issue has been there for decades. I wouldn't know
> > what to attribute it for.
> 
> Again, your job.

Alright, if you insist:

Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM 
v7")

now everyone running 2.6 kernels will backport this fix, I presume.


> >
> >
> > I guess I could send these separately, too, why not. Not sure
> > what this accomplishes, but hey.  But is that an ack? You want
> > this fix merged even before the feature?
> 
> I already made the case as to why, as have other maintainers.
> 
> If you need to review what an ack looks like please consult
> https://docs.kernel.org/process/5.Posting.html
> 
> Thanks, Lorenzo

I am merely asking if you want this patch in the set including
all these nits I had to fix.

-- 
MST


Reply via email to