On 27 Mar 2026, at 12:08, Lorenzo Stoakes (Oracle) wrote: > On Fri, Mar 27, 2026 at 11:43:57AM -0400, Zi Yan wrote: >> On 27 Mar 2026, at 11:29, Lorenzo Stoakes (Oracle) wrote: >> >>> On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote: >>>> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote: >>>> >>>>> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote: >>>>>> Replace it with a check on the max folio order of the file's address >>>>>> space >>>>>> mapping, making sure PMD_ORDER is supported. >>>>>> >>>>>> Signed-off-by: Zi Yan <[email protected]> >>>>>> --- >>>>>> mm/huge_memory.c | 6 +++--- >>>>>> 1 file changed, 3 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>>> index c7873dbdc470..1da1467328a3 100644 >>>>>> --- a/mm/huge_memory.c >>>>>> +++ b/mm/huge_memory.c >>>>>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct >>>>>> vm_area_struct *vma) >>>>>> { >>>>>> struct inode *inode; >>>>>> >>>>>> - if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) >>>>>> - return false; >>>>>> - >>>>>> if (!vma->vm_file) >>>>>> return false; >>>>>> >>>>>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct >>>>>> vm_area_struct *vma) >>>>>> if (IS_ANON_FILE(inode)) >>>>>> return false; >>>>>> >>>>>> + if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER) >>>>>> + return false; >>>>>> + >>>>> >>>>> At this point I think this should be a separate function quite honestly >>>>> and >>>>> share it with 2/10's use, and then you can put the comment in here re: >>>>> anon >>>>> shmem etc. >>>>> >>>>> Though that won't apply here of course as shmem_allowable_huge_orders() >>>>> would >>>>> have been invoked :) >>>>> >>>>> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER >>>>> stuff is >>>>> unfortunate. >>>>> >>>>> Buuut having said that is this right actually? >>>>> >>>>> Because we have: >>>>> >>>>> if (((in_pf || smaps)) && vma->vm_ops->huge_fault) >>>>> return orders; >>>>> >>>>> Above it, and now you're enabling huge folio file systems to do non-page >>>>> fault >>>>> THP and that's err... isn't that quite a big change? >>>> >>>> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, >>>> right? >>>> This patchset changes the condition from all FSes to FSes with large folio >>>> support. >>> >>> No, READ_ONLY_THP_FOR_FS operates differently. >>> >>> It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is >>> mounted read-only. >>> >>> So due to: >>> >>> if (((in_pf || smaps)) && vma->vm_ops->huge_fault) >>> return orders; >>> >>> if (((!in_pf || smaps)) && file_thp_enabled(vma)) >>> return orders; >>> >>> | PF | MADV_COLLAPSE | khugepaged | >>> |-----------|---------------|------------| >>> large folio fs | ✓ | x | x | >>> READ_ONLY_THP_FOR_FS | x | ✓ | ✓ | >>> >>> After this change: >>> >>> | PF | MADV_COLLAPSE | khugepaged | >>> |-----------|---------------|------------| >>> large folio fs | ✓ | ✓ | ? | >>> >>> (I hope we're not enabling khugepaged for large folio fs - which shouldn't >>> be necessary anyway as we try to give them folios on page fault and they >>> use thp-friendly get_unused_area etc. :) >>> >>> We shouldn't be doing this. >>> >>> It should remain: >>> >>> | PF | MADV_COLLAPSE | khugepaged | >>> |-----------|---------------|------------| >>> large folio fs | ✓ | x | x | >>> >>> If we're going to remove it, we should first _just remove it_, not >>> simultaneously increase the scope of what all the MADV_COLLAPSE code is >>> doing without any confidence in any of it working properly. >>> >>> And it makes the whole series misleading - you're actually _enabling_ a >>> feature not (only) _removing_ one. >> >> That is what my RFC patch does, but David and willy told me to do this. :) >> IIUC, with READ_ONLY_THP_FOR_FS, FSes with large folio support will >> get THP via MADV_COLLAPSE or khugepaged. So removing the code like I >> did in RFC would cause regressions. > > OK I think we're dealing with a union of the two states here. > > READ_ONLY_THP_FOR_FS is separate from large folio support, as checked by > file_thp_enabled(): > > static inline bool file_thp_enabled(struct vm_area_struct *vma) > { > struct inode *inode; > > if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) > return false; > > if (!vma->vm_file) > return false; > > inode = file_inode(vma->vm_file); > > if (IS_ANON_FILE(inode)) > return false; > > return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > } > > So actually: > > | PF | MADV_COLLAPSE | khugepaged | > |-----------|---------------|------------| > large folio fs | ✓ | x | x | > READ_ONLY_THP_FOR_FS | x | ✓ | ✓ | > both! | ✓ | ✓ | ✓ | > > (Where it's impllied it's a read-only mapping obviously for the later two > cases.) > > Now without READ_ONLY_THP_FOR_FS you're going to: > > | PF | MADV_COLLAPSE | khugepaged | > |-----------|---------------|------------| > large folio fs | ✓ | x | x | > large folio + r/o | ✓ | ✓ | ✓ | > > And intentionally leaving behind the 'not large folio fs, r/o' case because > those file systems need to implement large folio support. > > I guess we'll regress those users but we don't care?
Yes. This also motivates FSes without large folio support to add large folio support instead of relying on READ_ONLY_THP_FOR_FS hack. > > I do think all this needs to be spelled out in the commit message though as > it's > subtle. > > Turns out this PitA config option is going to kick and scream a bit first > before > it goes... Sure. I will shameless steal your tables. Thank you for the contribution. ;) > >> >> I guess I need to rename the series to avoid confusion. How about? >> >> Remove read-only THP support for FSes without large folio support. > > Yup that'd be better :) > > Cheers, Lorenzo > >> >> [1] >> https://lore.kernel.org/all/[email protected]/ >> >>> >>> So let's focus as David suggested on one thing at a time, incrementally. >>> >>> And let's please try and sort some of this confusing mess out in the code >>> if at all possible... >>> >>>> >>>> Will add a helper, mapping_support_pmd_folio(), for >>>> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER. >>>> >>>>> >>>>> So yeah probably no to this patch as is :) we should just drop >>>>> file_thp_enabled()? >>>> >>>> >>>> >>>>> >>>>>> return !inode_is_open_for_write(inode) && >>>>>> S_ISREG(inode->i_mode); >>>>>> } >>>>>> >>>>>> -- >>>>>> 2.43.0 >>>>>> >>>> >>>> >>>> Best Regards, >>>> Yan, Zi >>> >>> Cheers, Lorenzo >> >> >> Best Regards, >> Yan, Zi Best Regards, Yan, Zi

