Am 19.07.2014 01:57, schrieb Jeff King:
> On Fri, Jul 18, 2014 at 09:14:05PM +0200, René Scharfe wrote:
>> If inlining is really better is another matter; I don't understand how
>> 1a812f3a (hashcmp(): inline memcmp() by hand to optimize) could have made
>> git gc 18% faster, as it claimed.  I would expect memcmp(), which can
>> compare more than a byte at a time, to be significantly faster -- or at
>> least just as fast as whatever the compiler does with the inlined version.
> I looked into this a while ago[1]. I think with glibc 2.13 and up, the
> memcmp is a win. We should consider switching back if that is what is
> common now.
> -Peff
> [1]

On Linux this seems to improve rev-list performance by ca. 10% for me, but
on on FreeBSD it slows it down by ca. 25%.  That's less surprising after
looking at their memcmp() implementation:

It's basically our one-byte-at-a-time inlined version, sans inlining.

I'd say if a platform doesn't bother optimizing memcmp() then they
deserve the resulting performance.  And it's probably not too bad a
penalty because such comparisons probably won't make up a significant
part of most applications.

But perhaps we can do better.  First, here are the numbers I got for
"/usr/bin/time git rev-list --all --objects v2.0.0 >/dev/null" (best
of five runs):

Debian testing amd64 with gcc 4.9.0 and libc 2.19 on Hyper-V on Intel i5-4670:

  master (398dd4bd)
  2.14user 0.02system 0:02.16elapsed 99%CPU (0avgtext+0avgdata 
  0inputs+0outputs (0major+21985minor)pagefaults 0swaps

  using memcmp
  1.92user 0.02system 0:01.95elapsed 99%CPU (0avgtext+0avgdata 
  0inputs+0outputs (0major+21987minor)pagefaults 0swaps

  four bytes at a time
  1.93user 0.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 
  0inputs+0outputs (0major+22024minor)pagefaults 0swaps

Debian testing amd64 with clang 3.3 and libc 2.19 on Hyper-V on Intel i5-4670:

  master (398dd4bd)
  2.12user 0.04system 0:02.17elapsed 99%CPU (0avgtext+0avgdata 
  0inputs+0outputs (0major+21995minor)pagefaults 0swaps

  1.94user 0.01system 0:01.95elapsed 100%CPU (0avgtext+0avgdata 
  0inputs+0outputs (0major+21952minor)pagefaults 0swaps

  four bytes at a time
  1.86user 0.01system 0:01.88elapsed 99%CPU (0avgtext+0avgdata 
  0inputs+0outputs (0major+21909minor)pagefaults 0swaps

FreeBSD 10 amd64 with clang 3.3 on Hyper-V on Intel i5-4670c:

  master (398dd4bd)
        1.83 real         1.80 user         0.03 sys

  using memcmp
        2.32 real         2.32 user         0.00 sys

  four bytes at a time
        1.56 real         1.53 user         0.03 sys

Performance measurements in virtual machines are not very accurate, so
take them with a bag of salt, but the numbers were relatively stable
accross runs.

How come rev-list runs so much faster on FreeBSD?  Would it make sense
to apply the four-bytes-at-a-time patch?  How would it on other
platforms, especially ARM?  Can we do even better?  And more
importantly: Does it matter in the first place?

The memcmp version just replaces the body of hashcmp() with
"return memcmp(sha1, sha2, 20);".  Here's the four-bytes-at-a-time
 cache.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 92fc9f1..c6cd28e 100644
--- a/cache.h
+++ b/cache.h
@@ -732,13 +732,18 @@ extern const unsigned char null_sha1[20];
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
+       const uint32_t *p1 = (const uint32_t *)sha1;
+       const uint32_t *p2 = (const uint32_t *)sha2;
        int i;
-       for (i = 0; i < 20; i++, sha1++, sha2++) {
-               if (*sha1 != *sha2)
-                       return *sha1 - *sha2;
+       for (i = 0; i < 20 / sizeof(uint32_t); i++, p1++, p2++) {
+               uint32_t v1 = htonl(*p1);
+               uint32_t v2 = htonl(*p2);
+               if (v1 < v2)
+                       return -1;
+               if (v1 > v2)
+                       return 1;
        return 0;

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
More majordomo info at

Reply via email to