On Jun 22, 2007, at 3:52 AM, sad...@gmx.net wrote:

1. You might want to update your version of Open MPI if possible; the
v1.1.1 version is quite old.  We have added many new bug fixes and
features since v1.1.1 (including tight SGE integration).  There is
nothing special about the Open MPI that is included in the OFED
distribution; you can download a new version from the Open MPI web
site (the current stable version is v1.2.3), configure, compile, and
install it with your current OFED installation.  You should be able
to configure Open MPI with:

Hmm, I've heard about conflicts with OMPI 1.2.x and OFED 1.1 (sorry no
refference here),

I'm unaware of any problems with OMPI 1.2.x and OFED 1.1. I run OFED 1.1 on my cluster at Cisco and have many different versions of OMPI installed (1.2, trunk, etc.).

and I've got no luck producing a working OMPI
installation ("mpirun --help" runs, and ./IMB-MPI compiles and runs too,
but "mpirun -np 2 node03,node14 IMB-MPI1" doesnt (segmentation
fault))...

Can you send more information on this? See http://www.open-mpi.org/ community/help/

(beside that, I know that OFED 1.1 is quite old too) So I'm
tested it with OMPI 1.1.5 => same error.

*IF* all goes well, OFED 1.2 should be released today (famous last words).

2. I know little/nothing about SGE, but I'm assuming that you need to
have SGE pass the proper memory lock limits to new processes.  In an
interactive login, you showed that the max limit is "8162952" -- you
might just want to make it unlimited, unless you have a reason for
limiting it.  See http://www.open-mpi.org/faq/?

yes I allready read the faq, and even setting them to unlimited has
shown not be working. In the SGE one could specify the limits to
SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
modify > limits) But there is everything set to infinity. (Beside that,
the job is running with a static machinefile (is this an
"noninteractive" job?)) How could I test ulimits of interactive and
noninteractive jobs?

Launch an SGE job that calls the shell command "limit" (if you run C- shell variants) or "ulimit -l" (if you run Bourne shell variants). Ensure that the output is "unlimited".

What are the limits of the user that launches the SGE daemons? I.e., did the SGE daemons get started with proper "unlimited" limits? If not, that could hamper SGE's ability to set the limits that you told it to via qmon (remember my disclaimer: I know nothing about SGE, so this is speculation).

--
Jeff Squyres
Cisco Systems

Reply via email to