My employer has a nice new Cray XC30 (aka Cascade), and I thought I'd give Open MPI a quick test.
Given that it is INTENDED to be API-compatible with the XE series, I began configuring with CC=cc CXX=CC FC=ftn --with-platform=lanl/cray_xe6/optimized-nopanasas However, since this is Intel h/w, I commented-out the following 2 lines in the platform file: with_wrapper_cflags="-march=amdfam10" CFLAGS=-march=amdfam10 I am using PrgEnv-gnu/5.0.15, though PrgEnv-intel is the default on our system As far as I know, use of 1.6.x is out - no ugni at all, right? So, I didn't even try. I gave openmpi-1.7rc6 a try, but the ALPS headers and libs have moved (as mentioned in ompi-trunk/config/orte_check_alps.m4). Perhaps one should CMR the updated-for-CLE-5 configure logic to the 1.7 branch? Next, I tried a trunk nightly tarball: openmpi-1.9a1r27862.tar.bz2 As I mentioned above, the trunk has the right logic for locating ALPS. However, it looks like there is some untested code, protected by "#if WANT_CRAY_PMI2_EXT", that needs work: make[2]: Entering directory `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi' CC db_pmi_component.lo CC db_pmi.lo ../../../../../orte/mca/db/pmi/db_pmi.c: In function 'store': ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: 'ptr' undeclared (first use in this function) ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: (Each undeclared identifier is reported only once ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: for each function it appears in.) make[2]: *** [db_pmi.lo] Error 1 make[2]: Leaving directory `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte' make: *** [all-recursive] Error 1 I added the missing "char *ptr" declaration a few lines before it's first use, and resumed the build. This time the build terminated at make[2]: Entering directory `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/opal/tools/wrappers' CC opal_wrapper.o CCLD opal_wrapper /usr/bin/ld: attempted static link of dynamic object `../../../opal/.libs/libopen-pal.so' collect2: error: ld returned 1 exit status So I went back to the platform file and changed enable_shared=yes to enable_shared=no No big deal there - I had to make the same change for our XE6. And so I started back at configure (after a "make distclean", to be safe), and here is the next error: Making all in tools/orte-info make[2]: Entering directory `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/tools/orte-info' CCLD orte-info ../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function `orte_info_show_orte_version': orte_info_support.c:(.text+0xd70): multiple definition of `orte_info_show_orte_version' version.o:version.c:(.text+0x4b0): first defined here ../../../orte/.libs/libopen-rte.a(orte_info_support.o):(.data+0x0): multiple definition of `orte_info_type_orte' orte-info.o:(.data+0x10): first defined here /usr/bin/ld: link errors found, deleting executable `orte-info' collect2: error: ld returned 1 exit status make[2]: *** [orte-info] Error 1 I am not sure how to fix this, but I would guess this is probably a simple fix for somebody who knows OMPI's build infrastructure better than I. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900