Norihiro Tanaka wrote:
Could you try above cases?
Thanks, you're observing a 2.7x performance speedup with macros on your platform and your benchmark. With the same patch, I observed only a 1.18x speedup on the same benchmark. As usual, I'm testing with AMD Phenom II X4 910e + GCC 4.9.0 + Fedora 20 + default (-O2) optimization. I'm curious about why you're observing a much bigger performance difference with macros. What platform are you using?
Anyway, an 18% speedup is still a speedup, so I looked into it. GCC 4.9.0 misses a non-obvious opportunity for function inlining. I installed a tweak (attached) that should make the inlining opportunity obvious to compilers nowadays. On my platform this gave a 28% speedup, i.e., a bit better than the macro-using patch would have.
From 61497fb5ccdad9973a71a04f73f9d4252609395b Mon Sep 17 00:00:00 2001 From: Paul Eggert <[email protected]> Date: Sun, 27 Apr 2014 13:01:17 -0700 Subject: [PATCH] kwset: improve performance by inlining more Problem reported by Norihiro Tanaka in <http://bugs.gnu.org/17229#55>. * src/kwset.c (bmexec_trans): Rename from bmexec, and make it inline. (bmexec): New implementation, which calls bmexec_trans. This helps GCC inline more aggressively with the default optimization, and improves performance 25% with the reported benchmark on my host. --- src/kwset.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/kwset.c b/src/kwset.c index 8270f05..8e9b510 100644 --- a/src/kwset.c +++ b/src/kwset.c @@ -584,9 +584,9 @@ memchr_kwset (char const *s, size_t n, kwset_t kwset) return n == 0 ? NULL : memchr2 (s, kwset->gc1, kwset->gc1help, n); } -/* Fast boyer-moore search. */ -static size_t _GL_ATTRIBUTE_PURE -bmexec (kwset_t kwset, char const *text, size_t size) +/* Fast Boyer-Moore search (inlinable version). */ +static inline size_t _GL_ATTRIBUTE_PURE +bmexec_trans (kwset_t kwset, char const *text, size_t size) { unsigned char const *d1; char const *ep, *sp, *tp; @@ -667,6 +667,17 @@ bmexec (kwset_t kwset, char const *text, size_t size) return -1; } +/* Fast Boyer-Moore search. */ +static size_t +bmexec (kwset_t kwset, char const *text, size_t size) +{ + /* Help the compiler inline bmexec_trans in two ways, depending on + whether kwset->trans is null. */ + return (kwset->trans + ? bmexec_trans (kwset, text, size) + : bmexec_trans (kwset, text, size)); +} + /* Hairy multiple string search. */ static size_t _GL_ARG_NONNULL ((4)) cwexec (kwset_t kwset, char const *text, size_t len, struct kwsmatch *kwsmatch) -- 1.9.0
