Christopher,
I cannot reproduce your problem on my fresh installed 1.6.1rc2. I've used the attached program which is essentially your test case with a bit modification sin order to make in compilable.

But what I see is that there seem to be a small timeout somewhere in initializing stage: if you starting processes on nodes in another IB island without explicitly definition which interface has to be used for startup communication, it hangs for some 20 seconds. (I think openmpi try to communicate over not connected Eth's and run into timeout). Thus we use this:
-mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0

Nevertheless, I cannot reproduce your initial issue with 1.6.1rc2 in our environment.

Best
Paul Kapinos



$ time /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 -np 4 -H linuxscc005,linuxscc004 a.out
linuxscc004.rz.RWTH-Aachen.DE(3) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(0) of 4 provided=(3)
linuxscc004.rz.RWTH-Aachen.DE(1) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(2) of 4 provided=(3)
/opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -mca oob_tcp_if_include 0.06s user 0.09s system 9% cpu 1.608 total

$ time /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -np 4 -H linuxscc005,linuxscc004 a.out
linuxscc004.rz.RWTH-Aachen.DE(1) of 4 provided=(3)
linuxscc004.rz.RWTH-Aachen.DE(3) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(0) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(2) of 4 provided=(3)
/opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -np 4 -H a.out 0.04s user 0.10s system 0% cpu 23.600 total




On 08/03/12 09:29, Christopher Yeoh wrote:
I've narrowed it down to a very simple test case
(you don't need to explicitly spawn any threads).
Just need a program like:
....
If its run with "--mpi-preconnect_mpi 1" then it hangs in MPI_Init_thread. If 
not,
then it hangs in MPI_Barrier. Get a backtrace that looks like this (with the 
former):


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
#include <unistd.h>


int main(int argc, char **argv)
{
        char hostname[MPI_MAX_PROCESSOR_NAME];
        int rank, size, provided, laenge;

        MPI_Init_thread( &argc, &argv, MPI_THREAD_MULTIPLE, &provided );

	MPI_Comm_size(MPI_COMM_WORLD, &size);
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);
	MPI_Get_processor_name(hostname,&laenge);
	
	
	
	
        if (provided != MPI_THREAD_MULTIPLE) {
                MPI_Finalize();
                errx(1, "MPI_Init_thread expected, MPI_THREAD_MULTIPLE (%d), "
                         "got %d \n", MPI_THREAD_MULTIPLE, provided);
        }

        printf("%s(%d) of %d provided=(%d)\n", hostname, rank, size, provided);

	MPI_Barrier(MPI_COMM_WORLD);

        MPI_Finalize();
}

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to