Okay, so here's a follow up on my progress. Apologies in advance for the long email here, but I'd like to be thorough about this before I forget.
For the sake of completeness, here's my setup. I'm running python 2.7.1 compiled from source with icc. I'm running ubuntu 10.10 on one of intel's new processors (a i7-2600). The goal is to compile numpy and scipy both with intel's compiler and intel's mkl. I finally got numpy to compile with icc / ifort with pretty much all of the tests passing. It's a bit of work, partly cause I was trying to be an optimization junky, but I thought I'd share my discoveries. Scipy also compiles, but with some errors (which are likely due to me not configuring f2py correctly). First, I wanted to compile things with intel's interprocedural optimization enabled, and that seems to work, but only if -O2 is used for the compiling stage and -O1 is used for the linking stage. If -O3 is given for the compiling stage, then the einsum test goes into some sort of infinite loop and hangs. If -O2 or -O3 are given for the linker, then there are random other segfaults (I forget where). However, with these optimization levels, things are stable. Also, if I turn off -ipo, then -O3 works fine for compiling. I'm not sure if this reflects bugs in the flags I'm passing to the intel compiler or in icc/ifort itself. Second, to use -ipo, it's critical that xiar is used instead of ar to create object archives. This needed to be changed in fcompiler/intel.py and intelccompiler.py. I've attached a diff of these files that gives working options for me. I don't know if these options are set in the correct place or not, but perhaps they would be helpful: The essence of it is the following (from intelccompiler.py) linker_flags = '-O1 -ipo -openmp -lpthread -fno-alias -xHOST -fPIC ' compiler_opt_flags = '-static -ipo -xHOST -O2 -fPIC -DMKL_LP64 -mkl -wd188 -g -fno-alias ' icc_run_string = 'icc ' + compiler_opt_flags icpc_run_string = 'icpc ' + compiler_opt_flags linker_run_string = 'icc ' + linker_flags + ' -shared ' with the rest of this diff setting these options. In this case, the -openmp and -lpthread are required for linking with the threaded layer of the MKL. This could possibly be ripped out of there. Also, the -fno-alias is critical for the c compiler -- random segfaults and memory corruptions occur without it. The -DMKL_LP64 is to ensure proper linking with the lp64 (32 bit indices) part of mkl, instead of the ilp64 (64 bit indices). The latter isn't supported by the lapack_lite module -- things compile, but don't work. -mkl may or may not help things. For the fortran side, this was the compiler string: compiler_opt_flags = '-static -ipo -xHOST -fPIC -DMKL_LP64 -mkl -wd188 -g -fno-alias -O3' Here you don't need the -fno-alias and -O3 seems to work. Third, it was a bit of a pain to figure out how to get the linking/detection done correctly, as somehow order matters, and it was easy to get undefined symbols, runtime errors, etc. Very annoying. In the end, my site.cfg file looked like this: [DEFAULT] library_dirs=/usr/intel/current/mkl/lib/intel64 include_dirs=/usr/intel/current/mkl/include mkl_libs = mkl_rt, mkl_core, mkl_intel_thread, mkl_intel_lp64 blas_libs = mkl_blas95_lp64 lapack_libs = mkl_lapack95_lp64 [lapack_opt] library_dirs=/usr/intel/current/mkl/lib/intel64 include_dirs=/usr/intel/current/mkl/include/intel64/lp64 libraries = mkl_lapack95_lp64 [blas_opt] library_dirs = /usr/intel/current/mkl/lib/intel64 include_dirs = /usr/intel/current/mkl/include/intel64/lp64 libraries = mkl_blas95_lp64 where /usr/intel/current/ points to my intel install location. It's critical that the mkl_libs are given in that order. I didn't find another combination that worked. Finally, I attached my bash setup script for environment variables. I don't know how much of a role those play in things, but I had them in place when things started working, so I should put them here. Now, on to scipy. With all these options in place, scipy compiles fine. However, there are two problems, and these don't seem to go away at any optimization level. I'm looking for suggestions. I'm guessing it's some sort of configuration error. 1) The CloughTocher2DInterpolator segfaults every time it's called to interpret values. I couldn't manage to track it down -- it's in the cython code somewhere -- but I can give more details next time, I disabled it for now. 2) f2py isn't getting the interfaces right. When I run the test suite, I get about 250 errors, all of the form: ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (5,5,) and so on, with different tuples on the end. Other than these errors, everything seemed to work great. What might I be doing wrong there? Thanks! -- Hoyt ++++++++++++++++++++++++++++++++++++++++++++++++ + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoy...@gmail.com ++++++++++++++++++++++++++++++++++++++++++
setupenv.sh
Description: Bourne shell script
diff --git a/numpy/distutils/fcompiler/intel.py b/numpy/distutils/fcompiler/intel.py index b593a91..aa632e6 100644 --- a/numpy/distutils/fcompiler/intel.py +++ b/numpy/distutils/fcompiler/intel.py @@ -10,6 +10,8 @@ compilers = ['IntelFCompiler', 'IntelVisualFCompiler', 'IntelItaniumFCompiler', 'IntelItaniumVisualFCompiler', 'IntelEM64VisualFCompiler', 'IntelEM64TFCompiler'] +compiler_opt_flags = '-static -ipo -xHOST -fPIC -DMKL_LP64 -mkl -wd188 -g -fno-alias -O3' + def intel_version_match(type): # Match against the important stuff in the version string return simple_version_match(start=r'Intel.*?Fortran.*?(?:%s).*?Version' % (type,)) @@ -35,7 +37,7 @@ class IntelFCompiler(BaseIntelFCompiler): 'compiler_f90' : [None], 'compiler_fix' : [None, "-FI"], 'linker_so' : ["<F90>", "-shared"], - 'archiver' : ["ar", "-cr"], + 'archiver' : ["xiar", "-cr"], 'ranlib' : ["ranlib"] } @@ -51,13 +53,13 @@ class IntelFCompiler(BaseIntelFCompiler): else: pic_flags = ['-KPIC'] opt = pic_flags + ["-cm"] - return opt + return opt + compiler_opt_flags.split(' ') def get_flags_free(self): return ["-FR"] def get_flags_opt(self): - return ['-O3','-unroll'] + return compiler_opt_flags.split(' ') def get_flags_arch(self): v = self.get_version() @@ -129,7 +131,7 @@ class IntelItaniumFCompiler(IntelFCompiler): 'compiler_fix' : [None, "-FI"], 'compiler_f90' : [None], 'linker_so' : ['<F90>', "-shared"], - 'archiver' : ["ar", "-cr"], + 'archiver' : ["xiar", "-cr"], 'ranlib' : ["ranlib"] } @@ -148,10 +150,10 @@ class IntelEM64TFCompiler(IntelFCompiler): 'compiler_fix' : [None, "-FI"], 'compiler_f90' : [None], 'linker_so' : ['<F90>', "-shared"], - 'archiver' : ["ar", "-cr"], + 'archiver' : ["xiar", "-cr"], 'ranlib' : ["ranlib"] } - + def get_flags_arch(self): opt = [] if cpu.is_PentiumIV() or cpu.is_Xeon(): diff --git a/numpy/distutils/intelccompiler.py b/numpy/distutils/intelccompiler.py index b82445a..d7f4fd7 100644 --- a/numpy/distutils/intelccompiler.py +++ b/numpy/distutils/intelccompiler.py @@ -2,24 +2,39 @@ from distutils.unixccompiler import UnixCCompiler from numpy.distutils.exec_command import find_executable +linker_flags = '-O1 -ipo -openmp -lpthread -fno-alias -xHOST -fPIC ' +compiler_opt_flags = '-static -ipo -xHOST -O2 -fPIC -DMKL_LP64 -mkl -wd188 -g -fno-alias ' +icc_run_string = 'icc ' + compiler_opt_flags +icpc_run_string = 'icpc ' + compiler_opt_flags +linker_run_string = 'icc ' + linker_flags + ' -shared ' + class IntelCCompiler(UnixCCompiler): """ A modified Intel compiler compatible with an gcc built Python. """ compiler_type = 'intel' - cc_exe = 'icc' - cc_args = 'fPIC' + cc_exe = icc_run_string + cc_args = '' def __init__ (self, verbose=0, dry_run=0, force=0): UnixCCompiler.__init__ (self, verbose,dry_run, force) - self.cc_exe = 'icc -fPIC' + self.cc_exe = icc_run_string compiler = self.cc_exe self.set_executables(compiler=compiler, compiler_so=compiler, compiler_cxx=compiler, linker_exe=compiler, - linker_so=compiler + ' -shared') + linker_so=linker_run_string, + archiver = ["xiar", "-cr"]) + + # Could NOT get this to work!!!! Grrr... + + # def get_flags(self): + # return compiler_opt_flags.split(' ') + + # def get_flags_linker_so(self): + # return linker_flags.split(' ') class IntelItaniumCCompiler(IntelCCompiler): compiler_type = 'intele' @@ -32,18 +47,19 @@ class IntelItaniumCCompiler(IntelCCompiler): class IntelEM64TCCompiler(UnixCCompiler): -""" A modified Intel x86_64 compiler compatible with a 64bit gcc built Python. + """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python. """ compiler_type = 'intelem' - cc_exe = 'icc -m64 -fPIC' - cc_args = "-fPIC" + cc_exe = icc_run_string + " -m64" + cc_args = "" def __init__ (self, verbose=0, dry_run=0, force=0): UnixCCompiler.__init__ (self, verbose,dry_run, force) - self.cc_exe = 'icc -m64 -fPIC' + self.cc_exe = icc_run_string + " -m64" compiler = self.cc_exe self.set_executables(compiler=compiler, compiler_so=compiler, compiler_cxx=compiler, linker_exe=compiler, - linker_so=compiler + ' -shared') + linker_so=linker_run_string, + archiver = ["xiar", "-cr"])
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion