On Fri, 5 Jun 2020, Martin Storsjö wrote:
As for the speed of musl, it doesn't seem to bad, at least for strings:
https://www.etalabs.net/compare_libcs.html
Those look decent yeah. My prime concern is for memcpy, where
implementations that use SIMD instructions might be even faster - which
might matter for multimedia applications.
If I understand correctly, one of the reasons of having these string
functions in vcruntime*.dll separately from the fixed UCRT, is that they
want to be able to easier ship newer tuned versions of them.
But I could actually try to make a small benchmark for this, to see if
there's any significant difference (and if the default one from
api-ms-win-crt-private-* that is used right now isn't much faster, it
isn't much of an issue).
I also noticed that api-ms-win-crt-string-* actually does contain memcpy_s
and memmove_s. So we could just have small wrappers that call these
instead, so we'd avoid having to maintain a performance sensitive
implementation of that. That leaves us with a few functions less where we
need a full implementation.
I did a few measurements with this now, and the TL;DR conclusion is -
redirecting to ucrtbase/api-ms-win-crt-string's memcpy_s should be a good
option.
I did the measurements with the "checkasm" tool from dav1d, with local
modifications here:
https://code.videolan.org/mstorsjo/dav1d/-/commits/memcpy-bench
I ran the tests with "checkasm --bench --bench-c --test=memcpy", and
looked at the runtimes for the 1 MB aligned case.
On x86_64 linux, the results look like this (cycles, smaller is better):
musl_clang: 290599.7
glibc: 138632.0
musl_gcc: 138707.8
musl_x86_64_asm: 99238.8
On x86_64 windows, the relevant results are like this:
musl_clang_c: 279249.7
msvcrt.dll_memcpy: 202482.0
msvcrt.dll_memcpy_s: 134256.6
musl_gcc_c: 123527.4
vcruntime140.dll_memcpy: 101579.3
ucrtbase.dll_memcpy: 98145.5
ucrtbase.dll_memcpy_s: 97044.2
So the musl C code is pretty good when optimized by GCC, but clang does a
bad job with it. The musl x86_64 assesmbly implementation seems quite fast
for this testcase at least. I didn't try remaking the musl x86_64 assembly
implementation for windows calling convention, but by projecting from the
results above, it looks like it'd be in line with the vcruntime/ucrtbase
results anyway.
So making a wrapper that just forwards memcpy to api-ms-win-crt-string's
memcpy_s should be a performant solution that avoids us having to maintain
that performance sensitive code.
// Martin
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public