Jeff Squyres wrote:
2. I know little/nothing about SGE, but I'm assuming that you need to
have SGE pass the proper memory lock limits to new processes.  In an
interactive login, you showed that the max limit is "8162952" -- you
might just want to make it unlimited, unless you have a reason for
limiting it.  See http://www.open-mpi.org/faq/?
yes I allready read the faq, and even setting them to unlimited has
shown not be working. In the SGE one could specify the limits to
SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
modify > limits) But there is everything set to infinity. (Beside that,
the job is running with a static machinefile (is this an
"noninteractive" job?)) How could I test ulimits of interactive and
noninteractive jobs?

Launch an SGE job that calls the shell command "limit" (if you run C- shell variants) or "ulimit -l" (if you run Bourne shell variants). Ensure that the output is "unlimited".

What are the limits of the user that launches the SGE daemons? I.e., did the SGE daemons get started with proper "unlimited" limits? If not, that could hamper SGE's ability to set the limits that you told it to via qmon (remember my disclaimer: I know nothing about SGE, so this is speculation).


I am assuming you have tried without using SGE (like via ssh or others) to launch your job and that works correctly? If yes then you should compare the outputs of limit as Jeff suggested to see if they are any difference between the two (with and without using SGE).

I know of a similar problem with SGE's limitation that it cannot set the file descriptor limit for the user processes (and I believe the SGE folks are aware of the problem.) The workaround was to put the setting into the ~/.tcshrc. So if SGE is not setting other resource limit correctly or doesn't provide the option, you may have to workaround into the ~/.tcshrc or simliar settings file for your shell. Otherwise it'll probably fall back to use the system default.

--

- Pak Lui
pak....@sun.com

Reply via email to