On 2025/5/28 17:26, David Hildenbrand wrote:
On 22.05.25 11:39, Baolin Wang wrote:
On 2025/5/21 18:23, Nico Pache wrote:
On Tue, May 20, 2025 at 4:09 AM Baolin Wang
<baolin.w...@linux.alibaba.com> wrote:
Sorry for late reply.
On 2025/5/17 14:47, Nico Pache wrote:
On Thu, May 15, 2025 at 9:20 PM Baolin Wang
<baolin.w...@linux.alibaba.com> wrote:
On 2025/5/15 11:22, Nico Pache wrote:
khugepaged scans anons PMD ranges for potential collapse to a
hugepage.
To add mTHP support we use this scan to instead record chunks of
utilized
sections of the PMD.
khugepaged_scan_bitmap uses a stack struct to recursively scan a
bitmap
that represents chunks of utilized regions. We can then determine
what
mTHP size fits best and in the following patch, we set this
bitmap while
scanning the anon PMD. A minimum collapse order of 2 is used as
this is
the lowest order supported by anon memory.
max_ptes_none is used as a scale to determine how "full" an order
must
be before being considered for collapse.
When attempting to collapse an order that has its order set to
"always"
lets always collapse to that order in a greedy manner without
considering the number of bits set.
Signed-off-by: Nico Pache <npa...@redhat.com>
Sigh. You still haven't addressed or explained the issues I
previously
raised [1], so I don't know how to review this patch again...
Can you still reproduce this issue?
Yes, I can still reproduce this issue with today's (5/20) mm-new
branch.
I've disabled PMD-sized THP in my system:
[root]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
[root]# cat
/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
always inherit madvise [never]
And I tried calling madvise() with MADV_COLLAPSE for anonymous memory,
and I can still see it collapsing to a PMD-sized THP.
Hi Baolin ! Thank you for your reply and willingness to test again :)
I didn't realize we were talking about madvise collapse-- this makes
sense now. I also figured out why I could "reproduce" it before. My
script was always enabling the THP settings in two places, and I only
commented out one to test this. But this time I was doing more manual
testing.
The original design of madvise_collapse ignores the sysfs and
collapses even if you have an order disabled. I believe this behavior
is wrong, but by design. I spent some time playing around with madvise
collapses with and w/o my changes. This is not a new thing, I
reproduced the issue in 6.11 (Fedora 41), and I think its been
possible since the inception of madvise collapse 3 years ago. I
noticed a similar behavior on one of my RFC since it was "breaking"
selftests, and the fix was to reincorporate this broken sysfs
behavior.
OK. Thanks for the explanation.
7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage
collapse")
"This call is independent of the system-wide THP sysfs settings, but
will fail for memory marked VM_NOHUGEPAGE."
The second condition holds true (and fails for VM_NOHUGEPAGE), but I
dont know if we actually want madvise_collapse to be independent of
the system-wide.
This design principle surprised me a bit, and I failed to find the
reason in the commit log. I agree that "never should mean never," and we
should respect the THP/mTHP sysfs setting. Additionally, for the
'shmem_enabled' sysfs interface controlled for shmem/tmpfs, THP collapse
can still be prohibited through the 'deny' configuration. The rules here
are somewhat confusing.
I recall that we decided to overwrite "VM_NOHUGEPAGE", because the
assumption is that the same app that triggered MADV_NOHUGEPAGE triggers
the collapse. So the app decides on its own behavior.
Similarly, allowing for collapsing in a VM without VM_HUGEPAGE in the
"madvise" mode would be fine.
But in the "never" case, we should just "never" collapse.
OK. Let's fix the "never" case first. Thanks.