On Tue, 14 Apr 2026 10:01:57 +0200
"David Hildenbrand (Arm)" <[email protected]> wrote:
> On 4/14/26 07:09, Dev Jain wrote:
> >
> >
> > On 14/04/26 12:57 am, David Hildenbrand (Arm) wrote:
> >> On 4/10/26 16:30, Dev Jain wrote:
> >>> The original version of mremap_test (7df666253f26: "kselftests: vm: add
> >>> mremap tests") validated remapped contents byte-by-byte and printed a
> >>> mismatch index in case the bytes streams are not equal. That made
> >>> validation expensive in both cases: for "no mismatch" (the common case
> >>> when
> >>> mremap is not buggy), it still walked all bytes in C; for "mismatch", it
> >>> broke out of the loop after printing the mismatch index.
> >>>
> >>> Later, my commit 7033c6cc9620 ("selftests/mm: mremap_test: optimize
> >>> execution time from minutes to seconds using chunkwise memcmp") tried to
> >>> optimize both cases by using chunk-wise memcmp() and only scanning bytes
> >>> within a range which has been determined by memcmp as mismatching.
> >>>
> >>> But get_sqrt() in that commit is buggy: `high = mid - 1` is applied
> >>> unconditionally. This makes the speed of checking the mismatch index
> >>> suboptimal.
> >>
> >> So is that the only problem with 7033c6cc9620: the speed?
> >
> > Yes.
> >
> > I'll explain the algorithm in 7033c6cc9620.
> >
> > The problem statement is: given two buffers of equal length n, find the
> > first mismatch index.
> >
> > Algorithm: Divide the buffers into sqrt(n) chunks. Do a memcmp() over
> > each chunk. If all of them succeed, the buffers are equal, giving the
> > result in O(sqrt(n)) * t, where t = time taken by memcmp().
> >
> > Otherwise, worst case is that we find the mismatch in the last chunk.
> > Now brute-force iterate this chunk to find the mismatch. Since chunk
> > size is sqrt(n), complexity is again
> > sqrt(n) * t + sqrt(n) = O(sqrt(n)) * t.
> >
> > So if get_sqrt() computes a wrong square root, we lose this time
> > complexity.
>
> Ah, thanks for clarifying.
>
> >
> > Maybe there is an optimal value of x = #number of chunks of the buffer,
> > which may not be sqrt(n).
> >
> > But given the information we have, a CS course on algorithms will
> > say this is one of the optimal ways to do it.
> >
> >>
> >>>
> >>> The mismatch index does not provide useful debugging value here: if
> >>> validation fails, we know mremap behavior is wrong, and the specific byte
> >>> offset does not make root-causing easier.
> >>
> >> Fully agreed.
> >>
> >>>
> >>> So instead of fixing get_sqrt(), bite the bullet, drop mismatch index
> >>> scanning and just compare the two byte streams with memcmp().
> >>
> >> How does this affect the execution time of the test?
> >
> > I just checked with ./mremap_test -t 0, the variance is very high on my
> > system.
> >
> > In the common case of the test passing:
> >
> > before patch, there are multiple sub-length calls to memcmp.
> > after patch, there is a single full-length call to memcmp.
> >
> > So the time should reduce but may not be very distinguishable.
>
> Okay, so doesn't matter. I agree that we should simplify all that.
>
> The exact index is irrelevant. Whoever wants to debug the test failure
> could modify the test to find that out. It's one of the tests we don't
> really expect to fail (often).
>
> >
> >>
> >>>
> >>> Reported-by: Sarthak Sharma <[email protected]>
> >>> Signed-off-by: Dev Jain <[email protected]>
> >>
> >> Fixes: 7033c6cc9620 ("selftests/mm: mremap_test: optimize execution time
> >> from minutes to seconds using chunkwise memcmp")
> >>
> >> ?
> >
> > Not needed. 7033c6cc9620 does not create any incorrectness in the checking
> > of mismatch index.
>
> Yes, agreed.
>
>
> I would suggest to rewrite/simplify/clarify the patch description, not
> talking about "buggy" etc, focusing on the simplification.
>
> "
> The original version of mremap_test (7df666253f26: "kselftests: vm: add
> mremap tests") validated remapped contents byte-by-byte and printed a
> mismatch index in case the bytes streams didn't match. That was rather
> inefficient, especially also if the test passed.
>
> Later, commit 7033c6cc9620 ("selftests/mm: mremap_test: optimize
> execution time from minutes to seconds using chunkwise memcmp") used
> memcmp() on bigger chunks, to fallback to byte-wise scanning to detect
> the problematic index only if it discovered a problem.
>
> However, the implementation is overly complicated (e.g., get_sqrt() is
> currently not optimal) and we don't really have to report the exact
> index: whoever debugs the failing test can figure that out.
>
> Let's simplify by just comparing both byte streams with memcmp() and not
> detecting the exact failed index.
> "
ISTM that if you get a failure it doesn't really matter how long a
second scan takes.
So a simple byte loop that reports the offset and data of the first
error and counts the number of errors is more than enough.
David