Hi Pak,

> Jeff Squyres wrote:
>>>> 2. I know little/nothing about SGE, but I'm assuming that you need to
>>>> have SGE pass the proper memory lock limits to new processes.  In an
>>>> interactive login, you showed that the max limit is "8162952" -- you
>>>> might just want to make it unlimited, unless you have a reason for
>>>> limiting it.  See http://www.open-mpi.org/faq/?
>>> yes I allready read the faq, and even setting them to unlimited has
>>> shown not be working. In the SGE one could specify the limits to
>>> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
>>> modify > limits) But there is everything set to infinity. (Beside  
>>> that,
>>> the job is running with a static machinefile (is this an
>>> "noninteractive" job?)) How could I test ulimits of interactive and
>>> noninteractive jobs?
>> Launch an SGE job that calls the shell command "limit" (if you run C- 
>> shell variants) or "ulimit -l" (if you run Bourne shell variants).   
>> Ensure that the output is "unlimited".
>>
>> What are the limits of the user that launches the SGE daemons?  I.e.,  
>> did the SGE daemons get started with proper "unlimited" limits?  If  
>> not, that could hamper SGE's ability to set the limits that you told  
>> it to via qmon (remember my disclaimer: I know nothing about SGE, so  
>> this is speculation).
>>
> 
> I am assuming you have tried without using SGE (like via ssh or others) 
> to launch your job and that works correctly? If yes then you should 
> compare the outputs of limit as Jeff suggested to see if they are any 
> difference between the two (with and without using SGE).

Yes, without SGE all works, with SGE it does work too if I use a static
machinefile (see initial post), or -H h1,...,hn does work too! Just with
the SGE's generate $TMPDIR/machines file (which in turn is valid! I
checked this), the job doesn't run. And the ulimits are (in every three
possibilities every time) unlimited:

pos1: pdsh -R shh -w node[XX-YY] ulimit -a => unlimited

(loose coupled)
pos2: qsub jobscribt, where jobscript just calls the command as in pos1

(thight coupled?)
pos3: qsub jobscribt, where jobscript calls another script (containing
the same command as in pos1) and additionally passing $TMPDIR/machines
as argument to it.

Thanks for your help.

Reply via email to