Hi Lorenzo Stoakes, > > Hi Lorenzo Stoakes, > > > > >> + * > > >> + * This test deterministically validates process_madvise() with > > >> MADV_COLLAPSE > > >> + * on a remote process, other advices are difficult to verify reliably. > > >> + * > > >> + * The test verifies that a memory region in a child process, initially > > >> + * backed by small pages, can be collapsed into a Transparent Huge Page > > >> by a > > >> + * request from the parent. The result is verified by parsing the > > >> child's > > >> + * /proc/<pid>/smaps file. > > >> + */ > > > > > This is clever and you've put a lot of effort in, but this just seems > > > absolutely prone to flaking and you're essentially testing something > > > that's > > > highly automated. > > > > > I think you're also going way outside of the realms of testing > > > process_madvise() and are getting into testing essentially MADV_COLLAPSE > > > here. > > > > > > We have to try to keep the test specific to what it is you're testing - > > > which is process_madvise() itself. > > > > > So for me, and I realise you've put a ton of work into this and I'm really > > > sorry to say it, I think you should drop this specific test. > > > > > For me simply testing the remote MADV_DONTNEED is enough. > > > > My motivation for this complex test came from the need to verify that > > the process_madvise operation was actually successful. Without checking > > the outcome, the test would only validate that the syscall returns the > > correct number of bytes, not that the advice truly took effect on the > > target process's memory. > > > > For remote calls, process_madvise is intentionally limited to > > non-destructive advice: MADV_COLD, MADV_PAGEOUT, MADV_WILLNEED, > > and MADV_COLLAPSE. However, verifying the effects of COLD, PAGEOUT, > > and WILLNEED is very difficult to do reliably in a selftest. This left > > MADV_COLLAPSE as what seemed to be the only verifiable option. > > > > But, as you correctly pointed out, MADV_COLLAPSE is too dependent on > > the system's THP state and prone to races with khugepaged. This is the > > very issue I tried to work around in v4 after the v3 test failures. > > So I think this test is necessary. > > As for your other opinions, I completely agree.
> MADV_COLLAPSE is not a reliable test and we're going to end up with flakes. > The > implementation as-is is unreliable, and I"m not sure there's any way to make > it > not-unreliable. > This is especially true as we change THP behaviour over time. I don't want to > see failed test reports because of this. > I think it might be best to simply assert that the operation succesfully > completes without checking whether it actually executes the requested task - > it > would render this functionality completely broken if it were not to actually > do > what was requested. > > > > > > > > Best regards, > > Wang Lian Thank you for the clarification. You've convinced me. Your suggestion provides a much cleaner path forward. It allows the test to focus on the process_madvise syscall's interface???asserting the successful return???without the flakiness of verifying side-effects that are difficult to observe reliably. This makes the test much more robust. I will update the patch to implement this clear assertion logic. Thank you for guiding me to this better solution. Best regards, Wang Lian