Markus Daene schrieb:
> Hi.
> 
> I think it is not necessary to specify the hosts via the hostfile using SGE 
> and OpenMPI, even the $NSLOTS is not necessary , just run 
> mpirun executable this works very well.

This produces the same error, but thanks for your suggestion. (For the
sake of interest: how controls then ompi how many slots it may use?)


> to your memory problem:
> I had similar problems when I specified the h_vmem option to use in SGE. 
> Without SGE everything works, but starting with SGE gives such memory errors.
> You can easily check this with 'qconf -sc'. If you have used this option, try 
> without it. The problem in my case was that OpenMPI allocates sometimes a lot 
> of memory and the job gets immediately killed by SGE, and one gets such error 
> messages, see my posting some days ago. I am not sure if this helps in your 
> case but it could be an explanation.

Hmm it seems that I'm not using such an option (for my queue the h_vmem
and s_vmem values are set to infinity). Here the output for the qconf
-sc command. (Sorry for posting SGE related stuff on this mailing list):
[~]# qconf -sc
#name               shortcut   type        relop requestable consumable
default  urgency
#----------------------------------------------------------------------------------------
arch                a          RESTRING    ==    YES         NO
NONE     0
calendar            c          RESTRING    ==    YES         NO
NONE     0
cpu                 cpu        DOUBLE      >=    YES         NO
0        0
h_core              h_core     MEMORY      <=    YES         NO
0        0
h_cpu               h_cpu      TIME        <=    YES         NO
0:0:0    0
h_data              h_data     MEMORY      <=    YES         NO
0        0
h_fsize             h_fsize    MEMORY      <=    YES         NO
0        0
h_rss               h_rss      MEMORY      <=    YES         NO
0        0
h_rt                h_rt       TIME        <=    YES         NO
0:0:0    0
h_stack             h_stack    MEMORY      <=    YES         NO
0        0
h_vmem              h_vmem     MEMORY      <=    YES         NO
0        0
hostname            h          HOST        ==    YES         NO
NONE     0
load_avg            la         DOUBLE      >=    NO          NO
0        0
load_long           ll         DOUBLE      >=    NO          NO
0        0
load_medium         lm         DOUBLE      >=    NO          NO
0        0
load_short          ls         DOUBLE      >=    NO          NO
0        0
mem_free            mf         MEMORY      <=    YES         NO
0        0
mem_total           mt         MEMORY      <=    YES         NO
0        0
mem_used            mu         MEMORY      >=    YES         NO
0        0
min_cpu_interval    mci        TIME        <=    NO          NO
0:0:0    0
np_load_avg         nla        DOUBLE      >=    NO          NO
0        0
np_load_long        nll        DOUBLE      >=    NO          NO
0        0
np_load_medium      nlm        DOUBLE      >=    NO          NO
0        0
np_load_short       nls        DOUBLE      >=    NO          NO
0        0
num_proc            p          INT         ==    YES         NO
0        0
qname               q          RESTRING    ==    YES         NO
NONE     0
rerun               re         BOOL        ==    NO          NO
0        0
s_core              s_core     MEMORY      <=    YES         NO
0        0
s_cpu               s_cpu      TIME        <=    YES         NO
0:0:0    0
s_data              s_data     MEMORY      <=    YES         NO
0        0
s_fsize             s_fsize    MEMORY      <=    YES         NO
0        0
s_rss               s_rss      MEMORY      <=    YES         NO
0        0
s_rt                s_rt       TIME        <=    YES         NO
0:0:0    0
s_stack             s_stack    MEMORY      <=    YES         NO
0        0
s_vmem              s_vmem     MEMORY      <=    YES         NO
0        0
seq_no              seq        INT         ==    NO          NO
0        0
slots               s          INT         <=    YES         YES
1        1000
swap_free           sf         MEMORY      <=    YES         NO
0        0
swap_rate           sr         MEMORY      >=    YES         NO
0        0
swap_rsvd           srsv       MEMORY      >=    YES         NO
0        0
swap_total          st         MEMORY      <=    YES         NO
0        0
swap_used           su         MEMORY      >=    YES         NO
0        0
tmpdir              tmp        RESTRING    ==    NO          NO
NONE     0
virtual_free        vf         MEMORY      <=    YES         NO
0        0
virtual_total       vt         MEMORY      <=    YES         NO
0        0
virtual_used        vu         MEMORY      >=    YES         NO
0        0
# >#< starts a comment but comments are not saved across edits --------

thanks for your help.

Reply via email to