We currently use icc -xP,

"Can generate SSE3, SSE2, and SSE instructions for Intel processors,
and it can optimize for  processors based on Intel(R) Core(TM)
microarchitecture and Intel NetBurst(R) microarchitecture, like
Intel(R) Core(TM) Duo processors, Pentium(R) 4 processors with SSE3,
and Intel(R) Xeon(R) pro- cessors with SSE3. This is the default on
Mac OS X systems using IA-32 architecture."

But what we apparently need on Opteron-based systems like Ranger is
icc -xW:

"Can generate SSE2 and SSE instructions, and it can optimize for
Intel(R) Pentium(R) 4 processors and Intel(R) Xeon(R) processors
with SSE2. This is the default on Linux systems using Intel(R) 64
architecture.  This option is  the same as specifying
-march=pentium4."

I could just switch the optimizations in our aclocal.m4 file to use
the more conservative -xW.  It's better for us to always work and
sometimes use suboptimal compiler flags than to sometimes work and
sometimes use broken compiler flags.

Alternatively, it would be nice if we could autodetect what we're
running on at configure time.  I don't suppose there's any
cross-platform way to do "grep sse3 /proc/cpuinfo"?

Finally, it looks like icc 10 has the option to generate multiple code
paths; we could use "-xW -axP" to use SSE3 where it's supported and
fall back to SSE2-only otherwise.  This sounds like a way to bload
library and executable size, though.

Any preferences?
---
Roy

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Libmesh-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to