Hi Jerome, I am in part responsible for the current incarnation of the ALPS support in OMPI. We use the modules environment to set OMPI_ALPS_RESID to the ALPS reservation ID, the pertinent parts of which are:
set ridpath ${basedir}/share/openmpi set ridname ras-alps-command.sh set rid ${ridpath}/${ridname} # Set local cluster parameters for XT5. set resId [exec /bin/bash ${rid}] setenv OMPI_ALPS_RESID $resId Originally, the Cray XT systems automatically set an environmental variable, BATCH_PARTITION_ID to the ALPS reservation ID for the job. However, newer versions do not expose the ALPS reservation ID to the user. So, we need a way to get the ALPS reservation ID of the Torque job. Unfortunately, Cray has not made the internal structure of ALPS that does this available. So, we are forced to use apstat to get this information. But, apstat is not as robust as we might like. Ergo, the script is used to loop on apstat until it does not fail. In the end, we obtain the ALPS reservation ID for the current Torque job and set it to OMPI_ALPS_RESID. I chose this name so as to avoid namespace conflicts. So, the ALPS RAS mca is being selected, because your patch tells the ALPS RAS mca that BASIL_RESERVATION_ID is equivalent to OMPI_ALPS_RESID. In turn, while you invoke OMPI with mpirun, the OMPI version of mpirun will select the ALPS PLM mca. This will launch your job with an aprun (under the covers). So, your job does show a successful run. However, you may not be taking the path through mpirun that you intended. I do hope that I have cleared up some confusion. -- Ken Matney, Sr. Oak Ridge National Laboratory On Jul 9, 2010, at 6:19 AM, Jerome Soumagne wrote: Hi, We've recently installed OpenMPI on one of our Cray XT5 machines, here at CSCS. This machine uses SLURM for launching jobs. Doing an salloc defines this environment variable: BASIL_RESERVATION_ID The reservation ID on Cray systems running ALPS/BASIL only. Since the alps ras module tries to find a variable called OMPI_ALPS_RESID which is set using a script, we thought that for SLURM systems it would be a good idea to directly integrate this BASIL_RESERVATION_ID variable in the code, rather than using a script. The small patch is attached. Regards, Jerome -- Jérôme Soumagne Scientific Computing Research Group CSCS, Swiss National Supercomputing Centre Galleria 2, Via Cantonale | Tel: +41 (0)91 610 8258 CH-6928 Manno, Switzerland | Fax: +41 (0)91 610 8282 <patch_slurm_alps.txt><ATT00001..txt>