On 1/27/26 7:29 PM, David Hildenbrand (Red Hat) wrote:
On 1/26/26 07:59, Qi Zheng wrote:
On 1/23/26 11:15 PM, Andreas Larsson wrote:
On 2025-12-17 10:45, Qi Zheng wrote:
From: Qi Zheng <[email protected]>
The PT_RECLAIM can work on all architectures that support
MMU_GATHER_RCU_TABLE_FREE, so make PT_RECLAIM depends on
MMU_GATHER_RCU_TABLE_FREE.
BTW, change PT_RECLAIM to be enabled by default, since nobody should
want
to turn it off.
Signed-off-by: Qi Zheng <[email protected]>
---
arch/x86/Kconfig | 1 -
mm/Kconfig | 9 ++-------
2 files changed, 2 insertions(+), 8 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f859a..0d22da56a71b0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -331,7 +331,6 @@ config X86
select FUNCTION_ALIGNMENT_4B
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
- select ARCH_SUPPORTS_PT_RECLAIM if X86_64
select ARCH_SUPPORTS_SCHED_SMT if SMP
select SCHED_SMT if SMP
select ARCH_SUPPORTS_SCHED_CLUSTER if SMP
diff --git a/mm/Kconfig b/mm/Kconfig
index bd0ea5454af82..fc00b429b7129 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1447,14 +1447,9 @@ config ARCH_HAS_USER_SHADOW_STACK
The architecture has hardware support for userspace shadow
call
stacks (eg, x86 CET, arm64 GCS or RISC-V Zicfiss).
-config ARCH_SUPPORTS_PT_RECLAIM
- def_bool n
-
config PT_RECLAIM
- bool "reclaim empty user page table pages"
- default y
- depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
- select MMU_GATHER_RCU_TABLE_FREE
+ def_bool y
+ depends on MMU_GATHER_RCU_TABLE_FREE
help
Try to reclaim empty user page table pages in paths other
than munmap
and exit_mmap path.
Hi,
This patch unfortunately results in a WARN_ON_ONCE and unaligned
accesses on sparc64:
$ stress-ng --mmaphuge 20 -t 60
stress-ng: info: [559] setting to a 1 min run per stressor
stress-ng: info: [559] dispatching hogs: 20 mmaphuge
[ 560.592569] ------------[ cut here ]------------
[ 560.592663] WARNING: kernel/rcu/tree.c:3098 at
__call_rcu_common.constprop.0+0x200/0x760, CPU#4: stress-ng-mmaph/568
[ 560.592777] CPU: 4 UID: 1000 PID: 568 Comm: stress-ng-mmaph Not
tainted 6.19.0-rc5-00127-g62fc9f6ccb97 #8 VOLUNTARY
[ 560.592805] Call Trace:
[ 560.592812] [<00000000004368b8>] dump_stack+0x8/0x60
[ 560.592844] [<0000000000482a60>] __warn+0xe0/0x140
[ 560.592878] [<0000000000482b64>] warn_slowpath_fmt+0xa4/0x120
[ 560.592901] [<0000000000526a40>]
__call_rcu_common.constprop.0+0x200/0x760
[ 560.592931] [<0000000000526fd0>] call_rcu+0x10/0x20
[ 560.592954] [<0000000000730838>] tlb_remove_table+0x98/0xc0
[ 560.592986] [<000000000071bec4>] free_pgd_range+0x224/0x4c0
[ 560.593021] [<000000000071c35c>] free_pgtables+0x1fc/0x240
[ 560.593042] [<000000000074a6f0>] vms_clear_ptes+0x110/0x140
[ 560.593068] [<000000000074c3dc>] vms_complete_munmap_vmas+0x5c/0x280
[ 560.593094] [<000000000074de5c>] do_vmi_align_munmap+0x1dc/0x260
[ 560.593117] [<000000000074df80>] do_vmi_munmap+0xa0/0x140
[ 560.593142] [<000000000074fb2c>] __vm_munmap+0x8c/0x160
[ 560.593168] [<000000000072cfd4>] vm_munmap+0x14/0x40
[ 560.593190] [<00000000004402a8>] sys_64_munmap+0x88/0xa0
[ 560.593221] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
[ 560.593274] ---[ end trace 0000000000000000 ]---
[ 560.593960] log_unaligned: 209 callbacks suppressed
[ 560.593979] Kernel unaligned access at TPC[526a4c]
__call_rcu_common.constprop.0+0x20c/0x760
[ 560.594121] Kernel unaligned access at TPC[526864]
__call_rcu_common.constprop.0+0x24/0x760
[ 560.594198] Kernel unaligned access at TPC[52b3c4]
rcu_segcblist_enqueue+0x24/0x40
[ 560.594275] Kernel unaligned access at TPC[526860]
__call_rcu_common.constprop.0+0x20/0x760
[ 560.594360] Kernel unaligned access at TPC[526864]
__call_rcu_common.constprop.0+0x24/0x760
[ 567.054127] log_unaligned: 1105 callbacks suppressed
[ 567.054167] Kernel unaligned access at TPC[526860]
__call_rcu_common.constprop.0+0x20/0x760
[ 567.054331] Kernel unaligned access at TPC[526864]
__call_rcu_common.constprop.0+0x24/0x760
[ 567.054410] Kernel unaligned access at TPC[52b3c4]
rcu_segcblist_enqueue+0x24/0x40
Thanks for your report!
On sparc64, pmd and pud levels are not of struct page:
Can you elaborate, I don't understand what you mean :)
On sparc64:
static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table,
bool is_page)
{
unsigned long pgf = (unsigned long)table;
if (is_page)
pgf |= 0x1UL;
tlb_remove_table(tlb, (void *)pgf);
}
static inline void __tlb_remove_table(void *_table)
{
void *table = (void *)((unsigned long)_table & ~0x1UL);
bool is_page = false;
if ((unsigned long)_table & 0x1UL)
is_page = true;
pgtable_free(table, is_page);
}
void pgtable_free(void *table, bool is_page)
{
if (is_page)
__pte_free(table);
else
kmem_cache_free(pgtable_cache, table);
}
For pmd and pud levels, is_page is false, so we can not do the
following in __tlb_remove_table_one().
```
ptdesc = table;
call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu);
```
Is it also a problem on architectures like s390x and ppc, where we
squeeze multiple page tables into a physical pages?
For ppc, it's the same as for sparc64.
For s390x, it supports MMU_GATHER_RCU_TABLE_FREE and define its own
pxx_free_tlb(), but these all call tlb_remove_ptdesc(), so there is no
problem.
__pmd_free_tlb/__pud_free_tlb
--> pgtable_free_tlb(tlb, pud/pmd, false). <=== is_page == false
--> tlb_remove_table
So in __tlb_remove_table_one(), the table cannot be treated as
ptdesc because it does not have an pt_rcu_head member.
Hi David, it seems we still need to keep ARCH_SUPPORTS_PT_RECLAIM?
Or we invert it and only disable it for the known-problematic
architectures?
Yes, the problem lies with those architectures that support
MMU_GATHER_RCU_TABLE_FREE and define their own _tlb_remove_table().
So my plan is as follows:
1. convert __HAVE_ARCH_TLB_REMOVE_TABLE to
CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
2. make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE &&
!HAVE_ARCH_TLB_REMOVE_TABLE
I'll send v4 soon.
Thanks,
Qi