Hi Lorenzo Stoakes,

> > Hi Lorenzo Stoakes,
> >
> > >> + *
> > >> + * This test deterministically validates process_madvise() with 
> > >> MADV_COLLAPSE
> > >> + * on a remote process, other advices are difficult to verify reliably.
> > >> + *
> > >> + * The test verifies that a memory region in a child process, initially
> > >> + * backed by small pages, can be collapsed into a Transparent Huge Page 
> > >> by a
> > >> + * request from the parent. The result is verified by parsing the 
> > >> child's
> > >> + * /proc/<pid>/smaps file.
> > >> + */
> >
> > > This is clever and you've put a lot of effort in, but this just seems
> > > absolutely prone to flaking and you're essentially testing something 
> > > that's
> > > highly automated.
> >
> > > I think you're also going way outside of the realms of testing
> > > process_madvise() and are getting into testing essentially MADV_COLLAPSE
> > > here.
> >
> > > > We have to try to keep the test specific to what it is you're testing -
> > > which is process_madvise() itself.
> >
> > > So for me, and I realise you've put a ton of work into this and I'm really
> > > sorry to say it, I think you should drop this specific test.
> >
> > > For me simply testing the remote MADV_DONTNEED is enough.
> >
> > My motivation for this complex test came from the need to verify that
> > the process_madvise operation was actually successful. Without checking
> > the outcome, the test would only validate that the syscall returns the
> > correct number of bytes, not that the advice truly took effect on the
> > target process's memory.
> >
> > For remote calls, process_madvise is intentionally limited to
> > non-destructive advice: MADV_COLD, MADV_PAGEOUT, MADV_WILLNEED,
> > and MADV_COLLAPSE. However, verifying the effects of COLD, PAGEOUT,
> > and WILLNEED is very difficult to do reliably in a selftest. This left
> > MADV_COLLAPSE as what seemed to be the only verifiable option.
> >
> > But, as you correctly pointed out, MADV_COLLAPSE is too dependent on
> > the system's THP state and prone to races with khugepaged. This is the
> > very issue I tried to work around in v4 after the v3 test failures.
> > So I think this test is necessary.
> > As for your other opinions, I completely agree.

> MADV_COLLAPSE is not a reliable test and we're going to end up with flakes. 
> The
> implementation as-is is unreliable, and I"m not sure there's any way to make 
> it
> not-unreliable.

> This is especially true as we change THP behaviour over time. I don't want to
> see failed test reports because of this.

> I think it might be best to simply assert that the operation succesfully
> completes without checking whether it actually executes the requested task - 
> it
> would render this functionality completely broken if it were not to actually 
> do
> what was requested.

> >
> >
> >
> > Best regards,
> > Wang Lian

Thank you for the clarification. You've convinced me.

Your suggestion provides a much cleaner path forward. It allows the test
to focus on the process_madvise syscall's interface???asserting the
successful return???without the flakiness of verifying side-effects that
are difficult to observe reliably. This makes the test much more robust.

I will update the patch to implement this clear assertion logic. Thank
you for guiding me to this better solution.


Best regards,
Wang Lian

Reply via email to