Norihiro Tanaka wrote:
Could you try above cases?

Thanks, you're observing a 2.7x performance speedup with macros on your platform and your benchmark. With the same patch, I observed only a 1.18x speedup on the same benchmark. As usual, I'm testing with AMD Phenom II X4 910e + GCC 4.9.0 + Fedora 20 + default (-O2) optimization. I'm curious about why you're observing a much bigger performance difference with macros. What platform are you using?

Anyway, an 18% speedup is still a speedup, so I looked into it. GCC 4.9.0 misses a non-obvious opportunity for function inlining. I installed a tweak (attached) that should make the inlining opportunity obvious to compilers nowadays. On my platform this gave a 28% speedup, i.e., a bit better than the macro-using patch would have.
From 61497fb5ccdad9973a71a04f73f9d4252609395b Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Sun, 27 Apr 2014 13:01:17 -0700
Subject: [PATCH] kwset: improve performance by inlining more

Problem reported by Norihiro Tanaka in <http://bugs.gnu.org/17229#55>.
* src/kwset.c (bmexec_trans): Rename from bmexec, and make it inline.
(bmexec): New implementation, which calls bmexec_trans.  This helps
GCC inline more aggressively with the default optimization, and
improves performance 25% with the reported benchmark on my host.
---
 src/kwset.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/kwset.c b/src/kwset.c
index 8270f05..8e9b510 100644
--- a/src/kwset.c
+++ b/src/kwset.c
@@ -584,9 +584,9 @@ memchr_kwset (char const *s, size_t n, kwset_t kwset)
   return n == 0 ? NULL : memchr2 (s, kwset->gc1, kwset->gc1help, n);
 }
 
-/* Fast boyer-moore search. */
-static size_t _GL_ATTRIBUTE_PURE
-bmexec (kwset_t kwset, char const *text, size_t size)
+/* Fast Boyer-Moore search (inlinable version).  */
+static inline size_t _GL_ATTRIBUTE_PURE
+bmexec_trans (kwset_t kwset, char const *text, size_t size)
 {
   unsigned char const *d1;
   char const *ep, *sp, *tp;
@@ -667,6 +667,17 @@ bmexec (kwset_t kwset, char const *text, size_t size)
   return -1;
 }
 
+/* Fast Boyer-Moore search.  */
+static size_t
+bmexec (kwset_t kwset, char const *text, size_t size)
+{
+  /* Help the compiler inline bmexec_trans in two ways, depending on
+     whether kwset->trans is null.  */
+  return (kwset->trans
+          ? bmexec_trans (kwset, text, size)
+          : bmexec_trans (kwset, text, size));
+}
+
 /* Hairy multiple string search. */
 static size_t _GL_ARG_NONNULL ((4))
 cwexec (kwset_t kwset, char const *text, size_t len, struct kwsmatch *kwsmatch)
-- 
1.9.0

Reply via email to