Jeff Squyres schrieb:

>> Hmm, I've heard about conflicts with OMPI 1.2.x and OFED 1.1 (sorry no
>> refference here),
> 
> I'm unaware of any problems with OMPI 1.2.x and OFED 1.1.  I run OFED  
> 1.1 on my cluster at Cisco and have many different versions of OMPI  
> installed (1.2, trunk, etc.).

Yes you are right, I read wrong (in the OMPI 1.2 changelog (README) OFED
1.0 isn't considered to work with OMPI 1.2. Sorry..).

>> and I've got no luck producing a working OMPI
>> installation ("mpirun --help" runs, and ./IMB-MPI compiles and runs  
>> too,
>> but "mpirun -np 2 node03,node14 IMB-MPI1" doesnt (segmentation
>> fault))...
> 
> Can you send more information on this?  See http://www.open-mpi.org/ 
> community/help/

-sh-3.00$ ompi/bin/mpirun -d -np 2 -H node03,node06 hostname
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] connect_uni: connection not allowed
[headnode:23178] [0,0,0] setting up session dir with
[headnode:23178]        universe default-universe-23178
[headnode:23178]        user me
[headnode:23178]        host headnode
[headnode:23178]        jobid 0
[headnode:23178]        procid 0
[headnode:23178] procdir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0/0
[headnode:23178] jobdir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/0
[headnode:23178] unidir:
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178
[headnode:23178] top: openmpi-sessions-me@headnode_0
[headnode:23178] tmp: /tmp
[headnode:23178] [0,0,0] contact_file
/tmp/openmpi-sessions-me@headnode_0/default-universe-23178/universe-setup.txt
[headnode:23178] [0,0,0] wrote setup file
[headnode:23178] *** Process received signal ***
[headnode:23178] Signal: Segmentation fault (11)
[headnode:23178] Signal code: Address not mapped (1)
[headnode:23178] Failing at address: 0x1
[headnode:23178] [ 0] /lib64/tls/libpthread.so.0 [0x39ed80c430]
[headnode:23178] [ 1] /lib64/tls/libc.so.6(strcmp+0) [0x39ecf6ff00]
[headnode:23178] [ 2]
/home/me/ompi/lib/openmpi/mca_pls_rsh.so(orte_pls_rsh_launch+0x24f)
[0x2a9723cc7f]
[headnode:23178] [ 3] /home/me/ompi/lib/openmpi/mca_rmgr_urm.so
[0x2a9764fa90]
[headnode:23178] [ 4] /home/me/ompi/bin/mpirun(orterun+0x35b) [0x402ca3]
[headnode:23178] [ 5] /home/me/ompi/bin/mpirun(main+0x1b) [0x402943]
[headnode:23178] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x39ecf1c3fb]
[headnode:23178] [ 7] /home/me/ompi/bin/mpirun [0x40289a]
[headnode:23178] *** End of error message ***
Segmentation fault


>> yes I allready read the faq, and even setting them to unlimited has
>> shown not be working. In the SGE one could specify the limits to
>> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
>> modify > limits) But there is everything set to infinity. (Beside  
>> that,
>> the job is running with a static machinefile (is this an
>> "noninteractive" job?)) How could I test ulimits of interactive and
>> noninteractive jobs?
> 
> Launch an SGE job that calls the shell command "limit" (if you run C- 
> shell variants) or "ulimit -l" (if you run Bourne shell variants).   
> Ensure that the output is "unlimited".

I've done that allready, but how to distinguish between tight coupled
job ulimits and loose coupled job ulimits? I tested to pass
$TMPDIR/machines to a shell script which in turn delivers a "ulimit -a",
*assuming* this is considered as a tight coupled job, but each node
returned unlimited.. and without this $TMPDIR/machines too. Even the
headnode is set to unlimited.

> What are the limits of the user that launches the SGE daemons?  I.e.,  
> did the SGE daemons get started with proper "unlimited" limits?  If  
> not, that could hamper SGE's ability to set the limits that you told  

The limits in /etc/security/limits.conf apply to all users (using a
'*'), hence the SGE processes and deamons shouldn't have any limits.

> it to via qmon (remember my disclaimer: I know nothing about SGE, so  
> this is speculation).

But thanks anyway => I will post this issue to an SGE mailing list soon.
The config.log and the `ompi_info --all` is attached. Thanks again to
all of you.


Attachment: logs.tbz
Description: application/bzip-compressed-tar

Reply via email to