Re: -O2 considered harmful

Bruce Cran Thu, 27 Feb 2003 15:57:49 -0800

On Thu, Feb 27, 2003 at 09:49:13PM +0000, Nuno Teixeira wrote:
> On Thu, Feb 27, 2003 at 08:38:00AM +0000, Bruce Cran wrote:
> > I'm afraid you're wrong - the V2SI datatype and MMX functions automatically
> > become available after -march=pentium2, while with other processor types
> > you've got to explicitly add -mmmx. -msse is presumed with -march=pentium3 
> > and up.  It's far from absurd to use mmx for everyday applications - sure,
> > only a few applications may take advantage of it, but I've seen code which
> > runs 40x faster when compiled for athlon-xp than for i386, and I would guess
> > that a lot of that is because of clever use of sse and mmx.   That wasn't
> > an audio/video program, it was the libgmp arbitrary precision maths
> > package.  Also, I'm sure
> > most people wouldn't say no to 50% more processing speed for free!
> > So, if you've got a pentium, k6 or pentiumpro which supports MMX, you _do_ 
> > need to explicitly add -mmmx, but for other processors it's implied.
> >


> I searched gcc docs and didn't found info for what you say here. I'm
> seeing a lots of people using e.g. athlon-xp with -mmmx and -m3dnow
> included. I'm confused about if this optimizations are implied or not by
> processores that supports it.
> 

I, too, use -mmmx -msse -m3dnow with -march=athlon-xp.   I do it simply
because I don't trust gcc enough to do it for me - people have shown in
the past that -O2 and -O3 don't activate all the optimizations which
the docs claim they should, which is why you see people adding crazy stuff
like -funroll-loops -fomit-frame-pointer -fschedule-insns2 -fgcse ...
The surest way to find out about at which point gcc enables vector
extensions is, if you've got access to a suitable computer, compile the
following:

typedef int v4sf __attribute__ ((mode(V4SF)));

int main()
{
        v4sf a = {1,2,3,4};
        v4sf b = {5,6,7,8};
        v4sf c = __builtin_ia32_addps(a,b);
        return 0;
}

This will only compile when gcc has enabled sse instruction support.  I've 
found that this happens when you use -msse on it's own, even with
-march=pentium, and when you use -march=pentium3, -march=pentium4,
-march=athlon-xp etc without any extra -msse.   In addition, when compiling
mmx code, -m3dnow implies -mmmx, which makes sense since 3dnow is just an
extension of mmx.   

Of course what many people don't realise is that gcc, unlike icc, will not
produce any vector instructions unless either the -mfpunit=sse is enabled to
use sse for all floating point math, or vector instructions are 
explicitly coded for, as above.  So for most software, adding the extra flags
shouldn't affect it in the slightest, but for a few applications, it will
detect the vector unit and use it, resulting in a sometimes significant
performance gain.   You should notice a fairly large increase in performance
when using the sse unit, because unlike mmx and 3dnow it is a seperate
functional unit and so was designed to be fast, instead of being crippled
with backward compatability with the 387.  Indeed, with sse2, Intel seem
to have finally gained a very decent vector processing unit which can 
compete with similar processors such as the G4 with its AltiVec.

Bruce Cran

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: -O2 considered harmful

Reply via email to