On 1/19/2012 6:29 AM, Richard Guenther wrote:
On Thu, Jan 19, 2012 at 3:27 PM, willus.com<willus....@willus.com> wrote:
On 1/19/2012 2:59 AM, Richard Guenther wrote:
On Thu, Jan 19, 2012 at 7:37 AM, Marc Glisse<marc.gli...@inria.fr> wrote:
On Wed, 18 Jan 2012, willus.com wrote:
For those who might be interested, I've recently benchmarked gcc 4.6.3
(and 3.4.2) vs. Intel v11 and Microsoft (in Windows 7) here:
http://willus.com/ccomp_benchmark2.shtml
http://en.wikipedia.org/wiki/Microsoft_Windows_SDK#64-bit_development
For the math functions, this is normally more a libc feature, so you
might
get very different results on different OS. Then again, by using
-ffast-math, you allow the math functions to return any random value, so
I
can think of ways to make it even faster ;-)
Also for math functions you can simply substitute the Intel compilers one
(GCC uses the Microsoft ones) by linking against libimf. You can also
make
use of their vectorized variants from GCC by specifying -mveclibabi=svml
and link against libimf (the GCC autovectorizer will then use the routines
from the Intel compiler math library). That makes a huge difference for
code using functions from math.h.
Richard.
--
Marc Glisse
Thank you both for the tips. Are you certain that with the flags I used
Intel doesn't completely in-line the math2.h functions at the compile stage?
Yes. Intel merely comes with its own (optimized) math library while GCC
has to rely on the operating system one.
Wouldn't it be possible to in-line the standard C math functions in
math.h, though, if the correct compiler flags were set? I realize this
could be a big task and would potentially have a lot of dependencies on
the CPU flag settings, but is it at least conceivable? Or is it highly
undesirable for some reason? (I almost don't see how Intel could be so
fast on some of those functions without in-lining.)
Alternately, is anybody developing an open-source, x86-based or
x64-based fast math library (just for standard C math functions--I don't
need vector/array/BLAS/etc.) that auto-detects and takes advantage of
modern CPU capabilities?