We currently use icc -xP, "Can generate SSE3, SSE2, and SSE instructions for Intel processors, and it can optimize for processors based on Intel(R) Core(TM) microarchitecture and Intel NetBurst(R) microarchitecture, like Intel(R) Core(TM) Duo processors, Pentium(R) 4 processors with SSE3, and Intel(R) Xeon(R) pro- cessors with SSE3. This is the default on Mac OS X systems using IA-32 architecture."
But what we apparently need on Opteron-based systems like Ranger is icc -xW: "Can generate SSE2 and SSE instructions, and it can optimize for Intel(R) Pentium(R) 4 processors and Intel(R) Xeon(R) processors with SSE2. This is the default on Linux systems using Intel(R) 64 architecture. This option is the same as specifying -march=pentium4." I could just switch the optimizations in our aclocal.m4 file to use the more conservative -xW. It's better for us to always work and sometimes use suboptimal compiler flags than to sometimes work and sometimes use broken compiler flags. Alternatively, it would be nice if we could autodetect what we're running on at configure time. I don't suppose there's any cross-platform way to do "grep sse3 /proc/cpuinfo"? Finally, it looks like icc 10 has the option to generate multiple code paths; we could use "-xW -axP" to use SSE3 where it's supported and fall back to SSE2-only otherwise. This sounds like a way to bload library and executable size, though. Any preferences? --- Roy ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer ends at 11:59 p.m., Monday, April 7! Use priority code J8TLD2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Libmesh-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/libmesh-devel
