On Fri, Jul 11, 2025 at 07:16:00PM +0800, wang lian wrote:
> Hi Lorenzo Stoakes,
>
> >> + *
> >> + * This test deterministically validates process_madvise() with 
> >> MADV_COLLAPSE
> >> + * on a remote process, other advices are difficult to verify reliably.
> >> + *
> >> + * The test verifies that a memory region in a child process, initially
> >> + * backed by small pages, can be collapsed into a Transparent Huge Page 
> >> by a
> >> + * request from the parent. The result is verified by parsing the child's
> >> + * /proc/<pid>/smaps file.
> >> + */
>
> > This is clever and you've put a lot of effort in, but this just seems
> > absolutely prone to flaking and you're essentially testing something that's
> > highly automated.
>
> > I think you're also going way outside of the realms of testing
> > process_madvise() and are getting into testing essentially MADV_COLLAPSE
> > here.
>
> > We have to try to keep the test specific to what it is you're testing -
> > which is process_madvise() itself.
>
> > So for me, and I realise you've put a ton of work into this and I'm really
> > sorry to say it, I think you should drop this specific test.
>
> > For me simply testing the remote MADV_DONTNEED is enough.
>
> My motivation for this complex test came from the need to verify that
> the process_madvise operation was actually successful. Without checking
> the outcome, the test would only validate that the syscall returns the
> correct number of bytes, not that the advice truly took effect on the
> target process's memory.
>
> For remote calls, process_madvise is intentionally limited to
> non-destructive advice: MADV_COLD, MADV_PAGEOUT, MADV_WILLNEED,
> and MADV_COLLAPSE. However, verifying the effects of COLD, PAGEOUT,
> and WILLNEED is very difficult to do reliably in a selftest. This left
> MADV_COLLAPSE as what seemed to be the only verifiable option.
>
> But, as you correctly pointed out, MADV_COLLAPSE is too dependent on
> the system's THP state and prone to races with khugepaged. This is the
> very issue I tried to work around in v4 after the v3 test failures.
> So I think this test is necessary.
> As for your other opinions, I completely agree.

MADV_COLLAPSE is not a reliable test and we're going to end up with flakes. The
implementation as-is is unreliable, and I"m not sure there's any way to make it
not-unreliable.

This is especially true as we change THP behaviour over time. I don't want to
see failed test reports because of this.

I think it might be best to simply assert that the operation succesfully
completes without checking whether it actually executes the requested task - it
would render this functionality completely broken if it were not to actually do
what was requested.

>
>
>
> Best regards,
> Wang Lian

Reply via email to