> Are you referring to this SEGV error here? I am assuming this is OMPI 
> 1.1.1 so you are using rsh PLS to launch your executables (using loose 
> integration).

oops, I wanted to compile ompi 1.2.3 against OFED 1.1 and these are the
errors. This problem has nothing to do with the SGE anymore (Jeff
suggested me to migrate to a "slightly" newer version, so I tried and
failed with these errors) Should I start a whole new thread on this,
since the SGE question is solved?

>  >-sh-3.00$ ompi/bin/mpirun -d -np 2 -H node03,node06 hostname
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] connect_uni: connection not allowed
>  > [headnode:23178] [0,0,0] setting up session dir with
>  > [headnode:23178]        universe default-universe-23178
>  > [headnode:23178]        user me
>  > [headnode:23178]        host headnode
>  > [headnode:23178]        jobid 0
>  > [headnode:23178]        procid 0
>  > [headnode:23178] procdir:
>  > /tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0/0
>  > [headnode:23178] jobdir:
>  > /tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0
>  > [headnode:23178] unidir:
>  > /tmp/openmpi-sessions-me@headnode_0/default-universe-23178
>  > [headnode:23178] top: openmpi-sessions-me@headnode_0
>  > [headnode:23178] tmp: /tmp
>  > [headnode:23178] [0,0,0] contact_file
>  > /tmp/openmpi-sessions-me@headnode_0/default-universe-23178/universe-
>  > setup.txt
>  > [headnode:23178] [0,0,0] wrote setup file
>  > [headnode:23178] *** Process received signal ***
>  > [headnode:23178] Signal: Segmentation fault (11)
>  > [headnode:23178] Signal code: Address not mapped (1)
>  > [headnode:23178] Failing at address: 0x1
>  > [headnode:23178] [ 0] /lib64/tls/libpthread.so.0 [0x39ed80c430]
>  > [headnode:23178] [ 1] /lib64/tls/libc.so.6(strcmp+0) [0x39ecf6ff00]
>  > [headnode:23178] [ 2]
>  > /home/me/ompi/lib/openmpi/mca_pls_rsh.so(orte_pls_rsh_launch+0x24f)
>  > [0x2a9723cc7f]
>  > [headnode:23178] [ 3] /home/me/ompi/lib/openmpi/mca_rmgr_urm.so
>  > [0x2a9764fa90]
>  > [headnode:23178] [ 4] /home/me/ompi/bin/mpirun(orterun+0x35b)
>  > [0x402ca3]
>  > [headnode:23178] [ 5] /home/me/ompi/bin/mpirun(main+0x1b) [0x402943]
>  > [headnode:23178] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
>  > [0x39ecf1c3fb]
>  > [headnode:23178] [ 7] /home/me/ompi/bin/mpirun [0x40289a]
>  > [headnode:23178] *** End of error message ***
>  > Segmentation fault
> 
> So is it true that SEGV only occurred under the SGE environment and not 
> a normal environment? If it is then I am baffled because starting rsh 
> pls under the SGE environment in 1.1.1 should be no different than 
> starting rsh pls without SGE.

nope the config.log and "ompi_info --all" output are attached some posts
before. Sorry for this topic confusion.

thank you.

Reply via email to