Re: [PATCH v9 13/14] khugepaged: add per-order mTHP khugepaged stats

Baolin Wang Fri, 18 Jul 2025 21:42:49 -0700



On 2025/7/19 05:00, Nico Pache wrote:

On Thu, Jul 17, 2025 at 11:05 PM Baolin Wang
<baolin.w...@linux.alibaba.com> wrote:




On 2025/7/14 08:32, Nico Pache wrote:

With mTHP support inplace, let add the per-order mTHP stats for
exceeding NONE, SWAP, and SHARED.

Signed-off-by: Nico Pache <npa...@redhat.com>
---
   Documentation/admin-guide/mm/transhuge.rst | 17 +++++++++++++++++
   include/linux/huge_mm.h                    |  3 +++
   mm/huge_memory.c                           |  7 +++++++
   mm/khugepaged.c                            | 15 ++++++++++++---
   4 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst 
b/Documentation/admin-guide/mm/transhuge.rst
index 2c523dce6bc7..28c8af61efba 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -658,6 +658,23 @@ nr_anon_partially_mapped
          an anonymous THP as "partially mapped" and count it here, even though 
it
          is not actually partially mapped anymore.

+collapse_exceed_swap_pte
+       The number of anonymous THP which contain at least one swap PTE.
+       Currently khugepaged does not support collapsing mTHP regions that
+       contain a swap PTE.
+
+collapse_exceed_none_pte
+       The number of anonymous THP which have exceeded the none PTE threshold.
+       With mTHP collapse, a bitmap is used to gather the state of a PMD region
+       and is then recursively checked from largest to smallest order against
+       the scaled max_ptes_none count. This counter indicates that the next
+       enabled order will be checked.
+
+collapse_exceed_shared_pte
+       The number of anonymous THP which contain at least one shared PTE.
+       Currently khugepaged does not support collapsing mTHP regions that
+       contain a shared PTE.
+
   As the system ages, allocating huge pages may be expensive as the
   system uses memory compaction to copy data around memory to free a
   huge page for use. There are some counters in ``/proc/vmstat`` to help
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4042078e8cc9..e0a27f80f390 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -141,6 +141,9 @@ enum mthp_stat_item {
       MTHP_STAT_SPLIT_DEFERRED,
       MTHP_STAT_NR_ANON,
       MTHP_STAT_NR_ANON_PARTIALLY_MAPPED,
+     MTHP_STAT_COLLAPSE_EXCEED_SWAP,
+     MTHP_STAT_COLLAPSE_EXCEED_NONE,
+     MTHP_STAT_COLLAPSE_EXCEED_SHARED,
       __MTHP_STAT_COUNT
   };

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e2ed9493df77..57e5699cf638 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -632,6 +632,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, 
MTHP_STAT_SPLIT_FAILED);
   DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
   DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON);
   DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, 
MTHP_STAT_NR_ANON_PARTIALLY_MAPPED);
+DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, 
MTHP_STAT_COLLAPSE_EXCEED_SWAP);
+DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, 
MTHP_STAT_COLLAPSE_EXCEED_NONE);
+DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, 
MTHP_STAT_COLLAPSE_EXCEED_SHARED);
+

   static struct attribute *anon_stats_attrs[] = {
       &anon_fault_alloc_attr.attr,
@@ -648,6 +652,9 @@ static struct attribute *anon_stats_attrs[] = {
       &split_deferred_attr.attr,
       &nr_anon_attr.attr,
       &nr_anon_partially_mapped_attr.attr,
+     &collapse_exceed_swap_pte_attr.attr,
+     &collapse_exceed_none_pte_attr.attr,
+     &collapse_exceed_shared_pte_attr.attr,
       NULL,
   };

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d0c99b86b304..8a5873d0a23a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -594,7 +594,10 @@ static int __collapse_huge_page_isolate(struct 
vm_area_struct *vma,
                               continue;
                       } else {
                               result = SCAN_EXCEED_NONE_PTE;
-                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+                             if (order == HPAGE_PMD_ORDER)
+                                     count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+                             else
+                                     count_mthp_stat(order, 
MTHP_STAT_COLLAPSE_EXCEED_NONE);


Please follow the same logic as other mTHP statistics, meaning there is
no need to filter out PMD-sized orders, because mTHP also supports
PMD-sized orders. So logic should be:

if (order == HPAGE_PMD_ORDER)
         count_vm_event(THP_SCAN_EXCEED_NONE_PTE);

count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE);

Good point-- I will fix that!

                               goto out;
                       }
               }
@@ -623,8 +626,14 @@ static int __collapse_huge_page_isolate(struct 
vm_area_struct *vma,
               /* See khugepaged_scan_pmd(). */
               if (folio_maybe_mapped_shared(folio)) {
                       ++shared;
-                     if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged &&
-                         shared > khugepaged_max_ptes_shared)) {
+                     if (order != HPAGE_PMD_ORDER) {
+                             result = SCAN_EXCEED_SHARED_PTE;
+                             count_mthp_stat(order, 
MTHP_STAT_COLLAPSE_EXCEED_SHARED);
+                             goto out;
+                     }


Ditto.

Thanks!

There is also the SWAP one, which is slightly different as it is
calculated during the scan phase, and in the mTHP case in the swapin
faulting code. Not sure if during the scan phase we should also
increment the counter for the PMD order... or just leave it as a
general vm_event counter since it's not attributed to an order during
scan. I believe the latter is the correct approach and only attribute
an order to it in the __collapse_huge_page_swapin function if its mTHP
collapses.


Yes, that latter approach sounds reasonable to me.

Re: [PATCH v9 13/14] khugepaged: add per-order mTHP khugepaged stats

Reply via email to