This series provides optimized implementations of strnlen(), strchr(), and strrchr() for the RISC-V architecture. The strnlen implementation is derived from the existing optimized strlen. For strchr and strrchr, the current versions use simple byte-by-byte assembly logic, which will serve as a baseline for future Zbb-based optimizations.
The patch series is organized into three parts: 1. Correctness Testing: The first three patches add KUnit test cases for strlen, strnlen, and strrchr to ensure the baseline and optimized versions are functionally correct. 2. Benchmarking Tool: Patches 4 and 5 extend string_kunit to include performance measurement capabilities, allowing for comparative analysis within the KUnit environment. 3. Architectural Optimizations: The final three patches introduce the RISC-V specific assembly implementations. Following suggestions from Andy Shevchenko, performance benchmarks have been added to string_kunit.c to provide quantifiable evidence of the improvements. Andy provided many specific comments on the implementation of the benchmark logic, which is also inspired by Eric Biggers' crc_benchmark(). Performance was measured in a QEMU TCG (rv64) environment, comparing the generic C implementation with the new RISC-V assembly versions. Performance Summary (Improvement %): --------------------------------------------------------------- Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long) --------------------------------------------------------------- strnlen | +64.0% | +346.2% | +410.7% strchr | +4.0% | +6.4% | +1.5% strrchr | +6.6% | +2.8% | +0.0% --------------------------------------------------------------- The benchmarks can be reproduced by enabling CONFIG_STRING_KUNIT_BENCH and running: ./tools/testing/kunit/kunit.py run --arch=riscv \ --cross_compile=riscv64-linux-gnu- --kunitconfig=my_string.kunitconfig \ --raw_output The strnlen implementation leverages the Zbb 'orc.b' instruction and word-at-a-time logic, showing significant gains as the string length increases. For strchr and strrchr, the handwritten assembly reduces fixed overhead by eliminating stack frame management. The gain is most prominent on short strings (1-16B) where function call overhead dominates, while the performance converges with the C implementation for longer strings in the TCG environment. I would like to thank Andy Shevchenko for the suggestion to add benchmarks and for his detailed feedback on the test framework, and Eric Biggers for the benchmarking approach. Thanks also to Joel Stanley for testing support and feedback, and to David Laight for his suggestions regarding performance measurement. Changes: v3: - Re-implement benchmark logic inspired by crc_benchmark(). - Add 'len - 2' test case to strnlen correctness tests. - Incorporate detailed benchmark data into individual commit messages. v2: - Refactored lib/string.c to export __generic_* functions and added corresponding functional/performance tests for strnlen, strchr, and strrchr (Andy Shevchenko). - Replaced magic numbers with STRING_TEST_MAX_LEN etc. (Andy Shevchenko). v1: Initial submission. --- Feng Jiang (8): lib/string_kunit: add correctness test for strlen lib/string_kunit: add correctness test for strnlen lib/string_kunit: add correctness test for strrchr() lib/string_kunit: add performance benchmarks for strlen lib/string_kunit: extend benchmarks to strnlen and chr searches riscv: lib: add strnlen implementation riscv: lib: add strchr implementation riscv: lib: add strrchr implementation arch/riscv/include/asm/string.h | 9 ++ arch/riscv/lib/Makefile | 3 + arch/riscv/lib/strchr.S | 35 +++++ arch/riscv/lib/strnlen.S | 164 ++++++++++++++++++++ arch/riscv/lib/strrchr.S | 37 +++++ arch/riscv/purgatory/Makefile | 11 +- lib/Kconfig.debug | 11 ++ lib/tests/string_kunit.c | 258 ++++++++++++++++++++++++++++++++ 8 files changed, 527 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/lib/strchr.S create mode 100644 arch/riscv/lib/strnlen.S create mode 100644 arch/riscv/lib/strrchr.S -- 2.25.1
