Hi Pak, > Jeff Squyres wrote: >>>> 2. I know little/nothing about SGE, but I'm assuming that you need to >>>> have SGE pass the proper memory lock limits to new processes. In an >>>> interactive login, you showed that the max limit is "8162952" -- you >>>> might just want to make it unlimited, unless you have a reason for >>>> limiting it. See http://www.open-mpi.org/faq/? >>> yes I allready read the faq, and even setting them to unlimited has >>> shown not be working. In the SGE one could specify the limits to >>> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue > >>> modify > limits) But there is everything set to infinity. (Beside >>> that, >>> the job is running with a static machinefile (is this an >>> "noninteractive" job?)) How could I test ulimits of interactive and >>> noninteractive jobs? >> Launch an SGE job that calls the shell command "limit" (if you run C- >> shell variants) or "ulimit -l" (if you run Bourne shell variants). >> Ensure that the output is "unlimited". >> >> What are the limits of the user that launches the SGE daemons? I.e., >> did the SGE daemons get started with proper "unlimited" limits? If >> not, that could hamper SGE's ability to set the limits that you told >> it to via qmon (remember my disclaimer: I know nothing about SGE, so >> this is speculation). >> > > I am assuming you have tried without using SGE (like via ssh or others) > to launch your job and that works correctly? If yes then you should > compare the outputs of limit as Jeff suggested to see if they are any > difference between the two (with and without using SGE).
Yes, without SGE all works, with SGE it does work too if I use a static machinefile (see initial post), or -H h1,...,hn does work too! Just with the SGE's generate $TMPDIR/machines file (which in turn is valid! I checked this), the job doesn't run. And the ulimits are (in every three possibilities every time) unlimited: pos1: pdsh -R shh -w node[XX-YY] ulimit -a => unlimited (loose coupled) pos2: qsub jobscribt, where jobscript just calls the command as in pos1 (thight coupled?) pos3: qsub jobscribt, where jobscript calls another script (containing the same command as in pos1) and additionally passing $TMPDIR/machines as argument to it. Thanks for your help.