gzip's i386 assembly code, activated by default in the FreeBSD source tree,
produces poor performance on an i686 core (PPro/P2/P3). This is due to the
'partial register stall' problem, explained in a URL recently brought up on
the list, http://www.emulators.com/pentium4.htm.
In the course of learning more about partial register stalls I came across
the following i686 and i586 assembly optimizations for gzip:
http://www.muppetlabs.com/~breadbox/software/assembly.html.
This optimized i686 asm avoids partial reg stall and is between 20-40%
faster, with higher compression levels achieving greater benefit from the
patch. The i586 patch is usually only 5% faster, but in some cases achieves
a 25% speedup.
For completeness, I also ran some tests on a non-asm gcc 2.95.2 compile,
with and without -march=pentiumpro. Here are the results (three runs,
averaged, caches warmed with some throwaway runs) on a Pentium II 400,
linux-2.4.2.tar, --best.
[type] [user secs] [time (as % of slowest)]
i386 asm: 175 100%
no asm, -O: 142 81.1%
no asm, -O2: 139 79.4%
no asm, -O -march=pentiumpro: 136 77.7%
no asm, -O2 -march=pentiumpro: 140 80.0%
i686 asm: 124 70.8%
I'm interested in other people's results/tests. Particularly, I should do
some runs with -mcpu=pentiumpro as well.
An important part of the equation is to make sure it doesn't hurt i586
machines. I did several tests on a Pentium 200MMX; the i386 asm and the
gcc-emitted asm are not measurably different on that CPU.
Brian Raiter ([EMAIL PROTECTED], author of the i586/i686 asm patches)
has contacted the gzip maintainers, but it's been years since a release and
there may not be another gzip release. I have seen a 1.2.4a release which
had his files in a contrib/ directory, but they were not active in any way.
Since I would imagine a large percentage of FreeBSD users run on i686
cores, it'd be great to get this pretty significant speed increase into our
tree.
The i686 patch is neat (30% faster!) but its improvement over gcc's emitted
assembly is small. Disabling the old i386 assembly seems a good first
step. Attached is a patch that disables the custom asm.
I'm interested in hearing everyone's comments.
Aaron
Index: Makefile
===================================================================
RCS file: /usr/cvs/src/gnu/usr.bin/gzip/Makefile,v
retrieving revision 1.21
diff -u -r1.21 Makefile
--- Makefile 1999/08/27 23:35:48 1.21
+++ Makefile 2001/03/20 23:59:48
@@ -8,11 +8,6 @@
CFLAGS+=-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1
GREP_LIBZ?= YES
-.if ${MACHINE_ARCH} == "i386"
-SRCS+= match.S
-CFLAGS+=-DASMV
-.endif
-
MLINKS= gzip.1 gunzip.1 gzip.1 zcat.1 gzip.1 gzcat.1
MLINKS+= zdiff.1 zcmp.1