On Fri, Jul 11, 2025 at 07:16:00PM +0800, wang lian wrote: > Hi Lorenzo Stoakes, > > >> + * > >> + * This test deterministically validates process_madvise() with > >> MADV_COLLAPSE > >> + * on a remote process, other advices are difficult to verify reliably. > >> + * > >> + * The test verifies that a memory region in a child process, initially > >> + * backed by small pages, can be collapsed into a Transparent Huge Page > >> by a > >> + * request from the parent. The result is verified by parsing the child's > >> + * /proc/<pid>/smaps file. > >> + */ > > > This is clever and you've put a lot of effort in, but this just seems > > absolutely prone to flaking and you're essentially testing something that's > > highly automated. > > > I think you're also going way outside of the realms of testing > > process_madvise() and are getting into testing essentially MADV_COLLAPSE > > here. > > > We have to try to keep the test specific to what it is you're testing - > > which is process_madvise() itself. > > > So for me, and I realise you've put a ton of work into this and I'm really > > sorry to say it, I think you should drop this specific test. > > > For me simply testing the remote MADV_DONTNEED is enough. > > My motivation for this complex test came from the need to verify that > the process_madvise operation was actually successful. Without checking > the outcome, the test would only validate that the syscall returns the > correct number of bytes, not that the advice truly took effect on the > target process's memory. > > For remote calls, process_madvise is intentionally limited to > non-destructive advice: MADV_COLD, MADV_PAGEOUT, MADV_WILLNEED, > and MADV_COLLAPSE. However, verifying the effects of COLD, PAGEOUT, > and WILLNEED is very difficult to do reliably in a selftest. This left > MADV_COLLAPSE as what seemed to be the only verifiable option. > > But, as you correctly pointed out, MADV_COLLAPSE is too dependent on > the system's THP state and prone to races with khugepaged. This is the > very issue I tried to work around in v4 after the v3 test failures. > So I think this test is necessary. > As for your other opinions, I completely agree.
MADV_COLLAPSE is not a reliable test and we're going to end up with flakes. The implementation as-is is unreliable, and I"m not sure there's any way to make it not-unreliable. This is especially true as we change THP behaviour over time. I don't want to see failed test reports because of this. I think it might be best to simply assert that the operation succesfully completes without checking whether it actually executes the requested task - it would render this functionality completely broken if it were not to actually do what was requested. > > > > Best regards, > Wang Lian