On Wed, Jan 14, 2026 at 03:04:58PM +0800, Feng Jiang wrote:
> On 2026/1/14 14:14, Feng Jiang wrote:
> > On 2026/1/13 16:46, Andy Shevchenko wrote:
...
> > Thank you for the catch. You are absolutely correct—the 2500x figure is
> > heavily
> > distorted and does not reflect real-world performance.
> >
> > I've found that by using a volatile function pointer to call the
> > implementations
> > (instead of direct calls), the results returned to a realistic range. It
> > appears
> > the previous benchmark logic allowed the compiler to over-optimize the test
> > loop
> > in ways that skewed the data.
> >
> > I will refactor the benchmark logic in v3, specifically referencing the
> > crc32
> > KUnit implementation (e.g., using warm-up loops and adding preempt_disable()
> > to eliminate context-switch interference) to ensure the data is robust and
> > accurate.
> >
>
> Just a quick follow-up: I've also verified that using a volatile variable to
> store
> the return value (as seen in crc_benchmark()) is equally effective at
> preventing
> the optimization.
>
> The core change is as follows:
>
> volatile size_t len;
> ...
> for (unsigned int j = 0; j < iters; j++) {
> OPTIMIZER_HIDE_VAR(buf);
> len = strlen(buf);
But please, check for sure this is Linux kernel generic implementation (before)
and not __builtin_strlen() from GCC. (OTOH, it would be nice to benchmark that
one as well, although I think that __builtin_strlen() in general maybe slightly
better choice than Linux kernel generic implementation.) I.o.w. be sure *what*
you test.
> }
Or using WRITE_ONCE() :-) But that one will probably be confusing as it usually
should be paired with READ_ONCE() somewhere else in the code. So, I agree on
crc_benchmark() approach taken.
> Preliminary results with this change look much more reasonable:
>
> ok 4 string_test_strlen
> # string_test_strlen_bench: strlen performance (short, len: 8, iters:
> 100000):
> # string_test_strlen_bench: arch-optimized: 4767500 ns
> # string_test_strlen_bench: generic C: 5815800 ns
> # string_test_strlen_bench: speedup: 1.21x
> # string_test_strlen_bench: strlen performance (medium, len: 64, iters:
> 100000):
> # string_test_strlen_bench: arch-optimized: 6573600 ns
> # string_test_strlen_bench: generic C: 16342500 ns
> # string_test_strlen_bench: speedup: 2.48x
> # string_test_strlen_bench: strlen performance (long, len: 2048, iters:
> 10000):
> # string_test_strlen_bench: arch-optimized: 7931000 ns
> # string_test_strlen_bench: generic C: 35347300 ns
> # string_test_strlen_bench: speedup: 4.45x
> ok 5 string_test_strlen_bench
>
> I will adopt this pattern in v3, along with cache warm-up and
> preempt_disable(),
> to stay consistent with existing kernel benchmarks and ensure robust
> measurements.
--
With Best Regards,
Andy Shevchenko