Jonathan Zylstra wrote about the new Apple G4 processor (see quotes below).
The G4 is really a hybrid combination of the basic PowerPC CPU with a 128-bit
vector unit called the AltiVec. Richard Crandall (who consults for Apple)
and Jason Klivington (of Apple) used a beta version of the G4 to do some of
the all-integer verification of our (Crandall, Mayer, Papadopoulos) F24
(24th Fermat number) project. Note that we didn't get a chance to test the
floating-point capabilities of the G4 - both main "wavefront runs of the
Pe'pin test of F24 were on other hardware (a 250MHz MIPS R10000 and a
167MHz SPARC Ultra-1), both achieved a length-1 million FFT-based squaring
time in under 1 second, whereas the all-integer version needed about a 5 times
as long on a 400 MHz G4 - one can't conclude anything from that, since it's
like comparing apples and oranges.
Based on the technical specs and the relative all-integer timings on the
G4 vs. the Pentium (with similar amounts of code optimization, the G4 looks
to be somewhat faster than a Pentium, but by less than a factor of 2), it
looks
like a good processor, but all the ballyhoo needs to be put into perspective.
Before I continue, a disclaimer: I neither work nor consult for any computer
manufacturer. My taste in processors is simple: the faster, the better.
<<<"sustained performance of 1 gigaflop." (which makes the G4 a
'supercomputer' )
"theoretical performance of 4 gigaflops"
"It is a 128 bit processor, and can perform 4 ( sometimes 8 ) 32bit =
floating pt. calculations per cycle."
"It is 3 times faster then the PIII 600Mhz">>>
While all of this may be technically true, it's also very misleading:
- The 1 Gflop figure comes from the fact that the G4 can in theory dispatch 2
floating-point operations (1 mul, 1 add) per cycle, so at 500 MHz that equals
1 Gflop. I defy you to get that kind of performance out of, say, a code for a
large FFT-with very careful coding, you may get half that.
The above FP capabilities are not qualitatively different than for most other
current high-end processors (Pentium, SPARC, MIPS, Alpha) and are slightly
less
than the AMD K7, which can dispatch up to 3 FP ops per cycle (2 adds, 1 mul).
Thus, by the same reasoning, AMD could legitimately claim 2 Gflops for a
667 MHz K7. Never mind that one will only see that kind of performance for
perfectly balanced, perfectly pipelined code whose data never leave the
FP registers.
- 128-bit: also misleading. The AltiVec vector unit can do some fairly
nice operations in which a 128-bit integer operand is treated as a vector
of 4, 8, or 16 operands of 32, 16, and 8 bits, respectively, and can do
a nice variety of 4x32-bit vector FP operations, but in my opinion that
is far from constituting a true 128-bit CPU. The above enhancements are
qualitatively similar to the Pentium MMX enhancements - useful for things
like multimedia, but nearly useless when one is doing serious math with
64-bit operands. I say "nearly" since one can, e.g. build various 64-bit
integer operations out of ones on shorter operands, but it's a pain, say,
if one wants a 64-x64==>128-bit integer multiply, which is potentially
more usefull for LL testing than parallel 16x16==>32-bit multiplies. The
AltiVec 4x32-bit floats, like the similar MMX intructions, are not very
useful for FFT-based large- integer arithmetic - not enough precision.
- "It is 3 times faster then the PIII 600Mhz." Based on what? Give us some
SPECint or SPECfp figures that support this before making such claims. The
only datum provided (below) indicates a speedup of 1.45x, far less than 3x.
<<<On the comparison table between the PIII and G4, they show this:
Test: PIII Clock Cycles G4 Clock Cycles G4 =
Performance <- (Adjusted for MHz)
256 Pt. FFT 6.94 4 =
1.74x better than PIII 1.45x faster than PIII>>>
In what way is a tiny 256pt FFT a good indicator of overall system
performance?
Was this single precision (I suspect so) or double? Was it specially coded
to use the 4x32-bit FP ops supported by the G4? (I suspect so.) If so, was
similar coding attempted to use the PIII MMX instructions? (I suspect not.)
What numbers emerge when the figures are adjusted not just for MHz, but also
for price?
I've also seen some of Apples's ads in the San Jose Mercury News - the phrase
"The fastest desktop computer on earth" was used. Apparently this depends on
one's definition of both "fastest" an of "desktop computer," (perhaps even
of "Earth." :) I can buy a desktop Alpha 21264 which probably blows the G4
out of the water, performance-wise (and there, we HAVE some SPEC numbers to
guide us) on 95% of generic compiled code, say in Numerical Recipes or the
SPEC benchmark suites. So again, folks at Apple, please back your
claims up with performance data based on real-world code, the kind most
programmers really write, that tests more of the instruction set (not just
the special goodies you included) as well as the entire memory system (not
just the registers and L1 cache.)
Now, if the G4 demonstrates a SPEC FP of over 50 at 500MHz, and a SPEC INT
of over 30, then the "fastest on earth" claim will be more believable.
None of which is to say it's not a darn good processor - but let's keep the
descriptive language within the realm of the reasonable, shall we?
Some related URLs (thanks to Jason Klivington of Apple for these):
A technical discussion of the instruction set:
http://www.motorola.com/SPS/PowerPC/teksupport/teklibrary/altivec_pem.pdf
A description of the C implementation of the AltiVec instructions:
http://developer.apple.com/hardware/altivec/pdf/altivec_support.pdf
Motorola's general AltiVec site:
http://www.mot.com/SPS/PowerPC/AltiVec
Have fun,
-Ernst
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers