On 27 Mar 2026, at 12:22, Lance Yang wrote: > On Fri, Mar 27, 2026 at 11:00:26AM -0400, Zi Yan wrote: >> On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote: >> >>> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote: >>>> >>>> >>>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote: >>>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote: >>>>>> >>>>>> >>>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote: >>>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote: >>>>>>>>> collapse_file() requires FSes supporting large folio with at least >>>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem >>>>>>>>> with >>>>>>>>> huge option turned on also sets large folio order on mapping, so the >>>>>>>>> check >>>>>>>>> also applies to shmem. >>>>>>>>> >>>>>>>>> While at it, replace VM_BUG_ON with returning failure values. >>>>>>>>> >>>>>>>>> Signed-off-by: Zi Yan <[email protected]> >>>>>>>>> --- >>>>>>>>> mm/khugepaged.c | 7 +++++-- >>>>>>>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>>>>>>> index d06d84219e1b..45b12ffb1550 100644 >>>>>>>>> --- a/mm/khugepaged.c >>>>>>>>> +++ b/mm/khugepaged.c >>>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct >>>>>>>>> mm_struct *mm, unsigned long addr, >>>>>>>>> int nr_none = 0; >>>>>>>>> bool is_shmem = shmem_file(file); >>>>>>>>> - VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && >>>>>>>>> !is_shmem); >>>>>>>>> - VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); >>>>>>>>> + /* "huge" shmem sets mapping folio order and passes the check >>>>>>>>> below */ >>>>>>>>> + if (mapping_max_folio_order(mapping) < PMD_ORDER) >>>>>>>>> + return SCAN_FAIL; >>>>>>>> >>>>>>>> This is not true for anonymous shmem, since its large order allocation >>>>>>>> logic >>>>>>>> is similar to anonymous memory. That means it will not call >>>>>>>> mapping_set_large_folios() for anonymous shmem. >>>>>>>> >>>>>>>> So I think the check should be: >>>>>>>> >>>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER) >>>>>>>> return SCAN_FAIL; >>>>>>> >>>>>>> Hmm but in shmem_init() we have: >>>>>>> >>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>>>>> if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY) >>>>>>> SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge; >>>>>>> else >>>>>>> shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was >>>>>>> patched */ >>>>>>> >>>>>>> /* >>>>>>> * Default to setting PMD-sized THP to inherit the global >>>>>>> setting and >>>>>>> * disable all other multi-size THPs. >>>>>>> */ >>>>>>> if (!shmem_orders_configured) >>>>>>> huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER); >>>>>>> #endif >>>>>>> >>>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also >>>>>>> shmem_enabled_store() updates that if necessary. >>>>>>> >>>>>>> So we're still fine right? >>>>>>> >>>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() -> >>>>>>> __shmem_get_inode() which has: >>>>>>> >>>>>>> if (sbinfo->huge) >>>>>>> mapping_set_large_folios(inode->i_mapping); >>>>>>> >>>>>>> Shared for both anon shmem and tmpfs-style shmem. >>>>>>> >>>>>>> So I think it's fine as-is. >>>>>> >>>>>> I'm afraid not. Sorry, I should have been clearer. >>>>>> >>>>>> First, anonymous shmem large order allocation is dynamically controlled >>>>>> via >>>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) >>>>>> and >>>>>> the mTHP interfaces >>>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled). >>>>>> >>>>>> This means that during anonymous shmem initialization, these interfaces >>>>>> might be set to 'never'. so it will not call mapping_set_large_folios() >>>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'. >>>>>> >>>>>> Even if shmem large order allocation is subsequently enabled via the >>>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not >>>>>> called >>>>>> again. >>>>> >>>>> I see your point, oh this is all a bit of a mess... >>>>> >>>>> It feels like entirely the wrong abstraction anyway, since at best you're >>>>> getting a global 'is enabled'. >>>>> >>>>> I guess what happened before was we'd never call into this with ! r/o thp >>>>> for fs >>>>> && ! is_shmem. >>>> >>>> Right. >>>> >>>>> But now we are allowing it, but should STILL be gating on !is_shmem so >>>>> yeah your >>>>> suggestion is correct I think actualyl. >>>>> >>>>> I do hate: >>>>> >>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER) >>>>> >>>>> As a bit of code though. It's horrible. >>>> >>>> Indeed. >>>> >>>>> Let's abstract that... >>>>> >>>>> It'd be nice if we could find a way to clean things up in the lead up to >>>>> changes >>>>> in series like this instead of sticking with the mess, but I guess since >>>>> it >>>>> mostly removes stuff that's ok for now. >>>> >>>> I think this check can be removed from this patch. >>>> >>>> During the khugepaged's scan, it will call thp_vma_allowable_order() to >>>> check if the VMA is allowed to collapse into a PMD. >>>> >>>> Specifically, within the call chain thp_vma_allowable_order() -> >>>> __thp_vma_allowable_orders(), shmem is checked via >>>> shmem_allowable_huge_orders(), while other FSes are checked via >>>> file_thp_enabled(). >> >> But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config >> and can perform collapse anyway. This means without !is_shmem the check >> will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since > > Right. That will break MADV_COLLAPSE, IIUC. > > For MADV_COLLAPSE on anonymous shmem, eligibility is determined by the > TVA_FORCED_COLLAPSE path via shmem_allowable_huge_orders(), not by > whether the inode mapping got mapping_set_large_folios() at creation > time. > > Using mmap(MAP_SHARED | MAP_ANONYMOUS): > - create time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=never > - collapse time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=always > > With the !is_shmem guard, collapse succeeds. Without it, the same setup > fails with -EINVAL.
Thank you for the confirmation. I will fix it. > > Thanks, > Lance > >> I was in that TVA_FORCED_COLLAPSE email thread but does not remember >> everything there. >> >> >>> >>> It sucks not to have an assert. Maybe in that case make it a >>> VM_WARN_ON_ONCE(). >> >> Will do that as I replied to David already. >> >>> >>> I hate that you're left tracing things back like that... >>> >>>> >>>> For those other filesystems, Patch 5 has already added the following check, >>>> which I think is sufficient to filter out those FSes that do not support >>>> large folios: >>>> >>>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER) >>>> return false; >>> >>> 2 < 5, we won't tolerate bisection hazards. >>> >>>> >>>> >>>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by >>>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to >>>>>> check for allowed large orders, rather than relying on >>>>>> mapping_max_folio_order(). >>>>>> >>>>>> The mapping_max_folio_order() is intended to control large page >>>>>> allocation >>>>>> only for tmpfs mounts. Therefore, I find the current code confusing and >>>>>> think it needs to be fixed: >>>>>> >>>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */ >>>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge) >>>>>> mapping_set_large_folios(inode->i_mapping); >>>>> >> >> Hi Baolin, >> >> Do you want to send a fix for this? >> >> Also I wonder how I can distinguish between anonymous shmem code and tmpfs >> code. >> I thought they are the same thing except that they have different user >> interface, >> but it seems that I was wrong. >> >> >> Best Regards, >> Yan, Zi >> >> Best Regards, Yan, Zi

