Hi David, On Sat, Jun 18, 2022 at 12:52:23PM +0100, David CARLIER wrote: > From 9d7b6448a2407451c3115b701c51f97ab2bf6a59 Mon Sep 17 00:00:00 2001 > From: David Carlier <devne...@gmail.com> > Date: Sat, 18 Jun 2022 12:41:11 +0100 > Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction. > > Optimised version of the existing __builtin_memcpy builtin, differences > reside by the fact it works only with constant time sizes and does > generate extra calls. > At the moment, is clang exclusive even tough GCC does not seem to > want to implement it, the comments try not to reflect this current > state. > Usage can be expanded, used purposely only in few places for starter. > --- > include/haproxy/compiler.h | 12 ++++++++++++ > src/haproxy.c | 4 ++-- > src/log.c | 2 +- > src/proxy.c | 2 +- > 4 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h > index 66b5f5835..dca9f6aef 100644 > --- a/include/haproxy/compiler.h > +++ b/include/haproxy/compiler.h > @@ -192,6 +192,18 @@ > #endif > #endif > > +/* > + * This builtin works only with compile time lengths unlike __builtin_memcpy. > + * Also guarantees no external call is generated. > + * We could `replicate` the aforementioned constraint with a > + * _Static_assert/__builtin_constant_p theoretically, but that might be > + * too much trouble to make it happens (C11 min) > + * so here we trust the proper usage with other compilers (and the CI > infrastructure). > + */ > +#if !defined(__clang__) || __clang_major__ < 11 > +#define __builtin_memcpy_inline(x, y, s) memcpy(x, y, s) > +#endif > +
Hmmm please no, this is not the right way to do it for two reasons: - it will encourage use of __builtin_memcpy_inline() making one believe it's the expected one when it isn't ; - developing on non-clang systems will not report the problem with the non-constant size that __builtin_memcpy_inline() expects, resulting in breakage from time to time. I'd suggest that instead you create a macro named ha_memcpy_inline() that maps to __builtin_memcpy_inline() when present and s is a builtin constant, or maps to __builtin_memcpy() for the rest. Please note, by the way, that gcc since at least 3.4 has been emitting the same instructions (rep movsb) as clang's __builtin_memcpy_inline(), but only when optimizing for size. When optimizing for anything else, it can emit ton of inlined garbage^Wvector instructions. Thus I suspect we could achieve the same result by doing a clever arrangement of #pragma GCC push_options / #pragma GCC optimize("Os,inline") around an inline function definition that does the memcpy, and that is called by the macro. I have not tried, but probably something like this would do it: #pragma GCC push_options #pragma GCC optimize("Os,inline") static inline void _ha_memcpy_inline(void *restrict dest, const void *restrict src, size_t n) { __builtin_memcpy(dest, src, n); } #pragma GCC pop_options #if defined(clang...) #define ha_memcpy_inline(d,s,n) do { if (__builtin_constant_p(n)) __builtin_memcpy_inline(d,s,n); else _ha_memcpy_inline(d,s,n); } while (0) #else #define ha_memcpy_inline(d,s,n) _ha_memcpy_inline(d,s,n) #endif That's not even compile-tested and I haven't looked at the result, but you get the idea. Now regarding where to use it... I'd say it depends on a lot of factors, size, speed, undesired dependency on external libs. I think that each and every call place does warrant an individual commit with a justification and a check that the benefit matches expectations. There could be some value in using this in various low-level protocol layers (QUIC, HTX, SPOE). Maybe more, I don't know. Thanks, Willy