Thanks, I think I found a culprit: GCC doesn't optimize 'long long' words well for x86, for cmp. I installed this patch, which should make 'cmp' I/O-bound on your platform:
>From 43c2c0fd6ceedccd5a717ae0de3b1fcf857d27d9 Mon Sep 17 00:00:00 2001 From: Paul Eggert <[email protected]> Date: Mon, 12 Aug 2013 16:24:01 -0700 Subject: [PATCH] cmp: tune 'cmp a b' for GCC x86 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Performance problem reported by David Balažic in: http://lists.gnu.org/archive/html/bug-diffutils/2013-08/msg00013.html * src/system.h (word): Make it size_t, not uintmax_t. This sped up plain cmp 90% on my tests (GCC 4.8.1, x86). --- src/system.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/system.h b/src/system.h index 827df96..f39fff0 100644 --- a/src/system.h +++ b/src/system.h @@ -119,10 +119,12 @@ int strcasecmp (char const *, char const *); #include "propername.h" #include "version.h" -/* Type used for fast comparison of several bytes at a time. */ +/* Type used for fast comparison of several bytes at a time. + This used to be uintmax_t, but changing it to size_t + made plain 'cmp' 90% faster (GCC 4.8.1, x86). */ #ifndef word -# define word uintmax_t +# define word size_t #endif /* The integer type of a line number. Since files are read into main -- 1.7.11.7
