On Fri, 27 Jan 2023 at 17:16, Glenn Strauss <[email protected]> wrote: > > Fun! A small exercise for comparison if you like. > Cheers, Glenn > > void * > memrchr(const void *s, int c, size_t n) > { > const unsigned char *cp = (const unsigned char *)s + n; > const unsigned char ch = (unsigned char)c; > while (s != cp) { > if (*(--cp) == ch) > return (void *)cp; > } > return NULL; > }
Hi all, As promised, I played around a bit. I ran a few experiments with different memrchr() implementations. Everything I did can be found here: https://github.com/mmayer/cgit/tree/memrchr-compare The test-specific code is in the memrchr_test folder[1] within that repo. The four implementations I tried are: memrchr: the original implementation (from Apple's sudo command) that I submitted as v1 memrchr2: Alejandro's suggestion memrchr3: Glen's suggestion memrchr4: for added fun, musl-libc's implementation[2] I also checked the object and assembly files into the repo, so it's easier to look at them if anybody wants to. They live in the memrchr_test/output folder. Here are the results for ARM and x86, both in assembly/object size and runtime. ARM # Object size of memrchr and memrchr2 is the same -rw-r--r-- 1 mmayer staff 552 29 Jan 09:52 memrchr.o -rw-r--r-- 1 mmayer staff 552 29 Jan 09:52 memrchr2.o -rw-r--r-- 1 mmayer staff 544 29 Jan 09:52 memrchr3.o -rw-r--r-- 1 mmayer staff 544 29 Jan 09:52 memrchr4.o # Assembly source of memrchr2 is larger than memrchr -rw-r--r-- 1 mmayer staff 694 29 Jan 09:52 memrchr2.s -rw-r--r-- 1 mmayer staff 691 29 Jan 09:52 memrchr.s -rw-r--r-- 1 mmayer staff 655 29 Jan 09:52 memrchr3.s -rw-r--r-- 1 mmayer staff 655 29 Jan 09:52 memrchr4.s execution time: 18.61453 seconds execution time: 15.39163 seconds execution time: 13.56957 seconds execution time: 13.55493 seconds x86 -rw-r--r-- 1 mmayer staff 656 29 Jan 10:02 memrchr.o -rw-r--r-- 1 mmayer staff 656 29 Jan 10:02 memrchr2.o -rw-r--r-- 1 mmayer staff 656 29 Jan 10:02 memrchr3.o -rw-r--r-- 1 mmayer staff 648 29 Jan 10:02 memrchr4.o -rw-r--r-- 1 mmayer staff 835 29 Jan 10:02 memrchr.s -rw-r--r-- 1 mmayer staff 835 29 Jan 10:02 memrchr2.s -rw-r--r-- 1 mmayer staff 825 29 Jan 10:02 memrchr3.s -rw-r--r-- 1 mmayer staff 818 29 Jan 10:02 memrchr4.s execution time: 20.29937 seconds execution time: 23.67755 seconds execution time: 12.59514 seconds execution time: 11.38668 seconds As you can see, musl-libc provides the smallest implementation that is also the fastest. This is true for ARM and x86. So, I guess it makes the most sense to pick that (memrchr4.c in my experiments). The code is under a MIT license, which I assume is fine for CGIT. What does everybody think? Regards, -Markus [1] https://github.com/mmayer/cgit/tree/memrchr-compare/memrchr_test [2] https://git.musl-libc.org/cgit/musl/tree/src/string/memrchr.c
