On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
>
> > On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
> >> Replace it with a check on the max folio order of the file's address space
> >> mapping, making sure PMD_ORDER is supported.
> >>
> >> Signed-off-by: Zi Yan <[email protected]>
> >> ---
> >>  mm/huge_memory.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index c7873dbdc470..1da1467328a3 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct 
> >> vm_area_struct *vma)
> >>  {
> >>    struct inode *inode;
> >>
> >> -  if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> >> -          return false;
> >> -
> >>    if (!vma->vm_file)
> >>            return false;
> >>
> >> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct 
> >> vm_area_struct *vma)
> >>    if (IS_ANON_FILE(inode))
> >>            return false;
> >>
> >> +  if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> >> +          return false;
> >> +
> >
> > At this point I think this should be a separate function quite honestly and
> > share it with 2/10's use, and then you can put the comment in here re: anon
> > shmem etc.
> >
> > Though that won't apply here of course as shmem_allowable_huge_orders() 
> > would
> > have been invoked :)
> >
> > But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff 
> > is
> > unfortunate.
> >
> > Buuut having said that is this right actually?
> >
> > Because we have:
> >
> >             if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> >                     return orders;
> >
> > Above it, and now you're enabling huge folio file systems to do non-page 
> > fault
> > THP and that's err... isn't that quite a big change?
>
> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, 
> right?
> This patchset changes the condition from all FSes to FSes with large folio
> support.

No, READ_ONLY_THP_FOR_FS operates differently.

It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
mounted read-only.

So due to:

                if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
                        return orders;

                if (((!in_pf || smaps)) && file_thp_enabled(vma))
                        return orders;

                      |    PF     | MADV_COLLAPSE | khugepaged |
                      |-----------|---------------|------------|
large folio fs        |     ✓     |       x       |      x     |
READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |

After this change:

                      |    PF     | MADV_COLLAPSE | khugepaged |
                      |-----------|---------------|------------|
large folio fs        |     ✓     |       ✓       |      ?     |

(I hope we're not enabling khugepaged for large folio fs - which shouldn't
be necessary anyway as we try to give them folios on page fault and they
use thp-friendly get_unused_area etc. :)

We shouldn't be doing this.

It should remain:

                      |    PF     | MADV_COLLAPSE | khugepaged |
                      |-----------|---------------|------------|
large folio fs        |     ✓     |       x       |      x     |

If we're going to remove it, we should first _just remove it_, not
simultaneously increase the scope of what all the MADV_COLLAPSE code is
doing without any confidence in any of it working properly.

And it makes the whole series misleading - you're actually _enabling_ a
feature not (only) _removing_ one.

So let's focus as David suggested on one thing at a time, incrementally.

And let's please try and sort some of this confusing mess out in the code
if at all possible...

>
> Will add a helper, mapping_support_pmd_folio(), for
> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
>
> >
> > So yeah probably no to this patch as is :) we should just drop
> > file_thp_enabled()?
>
>
>
> >
> >>    return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> >>  }
> >>
> >> --
> >> 2.43.0
> >>
>
>
> Best Regards,
> Yan, Zi

Cheers, Lorenzo

Reply via email to