On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote:
> This series provides optimized implementations of strnlen(), strchr(),
> and strrchr() for the RISC-V architecture. The strnlen implementation
> is derived from the existing optimized strlen. For strchr and strrchr,

strchr() and strrchr()

> the current versions use simple byte-by-byte assembly logic, which
> will serve as a baseline for future Zbb-based optimizations.
> 
> The patch series is organized into three parts:
> 1. Correctness Testing: The first three patches add KUnit test cases
>    for strlen, strnlen, and strrchr to ensure the baseline and optimized

strlen(), strnlen(), and strrchr()

>    versions are functionally correct.
> 2. Benchmarking Tool: Patches 4 and 5 extend string_kunit to include
>    performance measurement capabilities, allowing for comparative
>    analysis within the KUnit environment.
> 3. Architectural Optimizations: The final three patches introduce the
>    RISC-V specific assembly implementations.
> 
> Following suggestions from Andy Shevchenko, performance benchmarks have
> been added to string_kunit.c to provide quantifiable evidence of the
> improvements. Andy provided many specific comments on the implementation
> of the benchmark logic, which is also inspired by Eric Biggers'
> crc_benchmark(). Performance was measured in a QEMU TCG (rv64) environment,
> comparing the generic C implementation with the new RISC-V assembly versions.
> 
> Performance Summary (Improvement %):
> ---------------------------------------------------------------
> Function  |  16 B (Short) |  512 B (Mid) |  4096 B (Long)
> ---------------------------------------------------------------
> strnlen   |    +64.0%     |   +346.2%    |    +410.7%

This is still suspicious.

> strchr    |    +4.0%      |   +6.4%      |    +1.5%
> strrchr   |    +6.6%      |   +2.8%      |    +0.0%
> ---------------------------------------------------------------
> The benchmarks can be reproduced by enabling CONFIG_STRING_KUNIT_BENCH
> and running: ./tools/testing/kunit/kunit.py run --arch=riscv \
> --cross_compile=riscv64-linux-gnu- --kunitconfig=my_string.kunitconfig \
> --raw_output
> 
> The strnlen implementation leverages the Zbb 'orc.b' instruction and

strnlen()

> word-at-a-time logic, showing significant gains as the string length
> increases.

Hmm... Have you tried to optimise the generic implementation to use
word-at-a-time logic and compare?

> For strchr and strrchr, the handwritten assembly reduces

strchr() and strrchr()

> fixed overhead by eliminating stack frame management. The gain is most
> prominent on short strings (1-16B) where function call overhead dominates,
> while the performance converges with the C implementation for longer
> strings in the TCG environment.

-- 
With Best Regards,
Andy Shevchenko



Reply via email to