> Markus Daene wrote:
> > Hi.
> >
> > I think it is not necessary to specify the hosts via the hostfile using
> > SGE and OpenMPI, even the $NSLOTS is not necessary , just run
> > mpirun executable this works very well.
>
> This produces the same error, but thanks for your suggestion. (For the
> sake of interest: how controls then ompi how many slots it may use?)

It just knows ist, I think the developers could answer this quastions.

> > to your memory problem:
> > I had similar problems when I specified the h_vmem option to use in SGE.
> > Without SGE everything works, but starting with SGE gives such memory
> > errors. You can easily check this with 'qconf -sc'. If you have used this
> > option, try without it. The problem in my case was that OpenMPI allocates
> > sometimes a lot of memory and the job gets immediately killed by SGE, and
> > one gets such error messages, see my posting some days ago. I am not sure
> > if this helps in your case but it could be an explanation.

I am sorry to discuss SGE stuff here as well, but there was this question and 
one should make clear that this is not just related to OMPI.

I think your output shows exactely the problem: you have set h_vmem as 
requestable and the default value to 0, the job has no memory at all. OMPI 
somehow knows that is has just this memory granted by SGE, so it cannot 
allocate any memory in this case. Of course you get the errors.
You should either set h_vmem to not requestable, or set a proper default 
value. e.g. 2.0G, or specify the memory consumption in your job script like
#$ -l h_vmem=2000M
it is not important that your queue has set h_vmem to infinity, this gives you 
just the maximum which you can request. 

Markus


> Hmm it seems that I'm not using such an option (for my queue the h_vmem
> and s_vmem values are set to infinity). Here the output for the qconf
> -sc command. (Sorry for posting SGE related stuff on this mailing list):
> [~]# qconf -sc
> #name               shortcut   type        relop requestable consumable
> default  urgency
> #--------------------------------------------------------------------------
>-------------- arch                a          RESTRING    ==    YES        
> NO
> NONE     0
> calendar            c          RESTRING    ==    YES         NO
> NONE     0
> cpu                 cpu        DOUBLE      >=    YES         NO
> 0        0
> h_core              h_core     MEMORY      <=    YES         NO
> 0        0
> h_cpu               h_cpu      TIME        <=    YES         NO
> 0:0:0    0
> h_data              h_data     MEMORY      <=    YES         NO
> 0        0
> h_fsize             h_fsize    MEMORY      <=    YES         NO
> 0        0
> h_rss               h_rss      MEMORY      <=    YES         NO
> 0        0
> h_rt                h_rt       TIME        <=    YES         NO
> 0:0:0    0
> h_stack             h_stack    MEMORY      <=    YES         NO
> 0        0
> h_vmem              h_vmem     MEMORY      <=    YES         NO
> 0        0
> hostname            h          HOST        ==    YES         NO
> NONE     0
> load_avg            la         DOUBLE      >=    NO          NO
> 0        0
> load_long           ll         DOUBLE      >=    NO          NO
> 0        0
> load_medium         lm         DOUBLE      >=    NO          NO
> 0        0
> load_short          ls         DOUBLE      >=    NO          NO
> 0        0
> mem_free            mf         MEMORY      <=    YES         NO
> 0        0
> mem_total           mt         MEMORY      <=    YES         NO
> 0        0
> mem_used            mu         MEMORY      >=    YES         NO
> 0        0
> min_cpu_interval    mci        TIME        <=    NO          NO
> 0:0:0    0
> np_load_avg         nla        DOUBLE      >=    NO          NO
> 0        0
> np_load_long        nll        DOUBLE      >=    NO          NO
> 0        0
> np_load_medium      nlm        DOUBLE      >=    NO          NO
> 0        0
> np_load_short       nls        DOUBLE      >=    NO          NO
> 0        0
> num_proc            p          INT         ==    YES         NO
> 0        0
> qname               q          RESTRING    ==    YES         NO
> NONE     0
> rerun               re         BOOL        ==    NO          NO
> 0        0
> s_core              s_core     MEMORY      <=    YES         NO
> 0        0
> s_cpu               s_cpu      TIME        <=    YES         NO
> 0:0:0    0
> s_data              s_data     MEMORY      <=    YES         NO
> 0        0
> s_fsize             s_fsize    MEMORY      <=    YES         NO
> 0        0
> s_rss               s_rss      MEMORY      <=    YES         NO
> 0        0
> s_rt                s_rt       TIME        <=    YES         NO
> 0:0:0    0
> s_stack             s_stack    MEMORY      <=    YES         NO
> 0        0
> s_vmem              s_vmem     MEMORY      <=    YES         NO
> 0        0
> seq_no              seq        INT         ==    NO          NO
> 0        0
> slots               s          INT         <=    YES         YES
> 1        1000
> swap_free           sf         MEMORY      <=    YES         NO
> 0        0
> swap_rate           sr         MEMORY      >=    YES         NO
> 0        0
> swap_rsvd           srsv       MEMORY      >=    YES         NO
> 0        0
> swap_total          st         MEMORY      <=    YES         NO
> 0        0
> swap_used           su         MEMORY      >=    YES         NO
> 0        0
> tmpdir              tmp        RESTRING    ==    NO          NO
> NONE     0
> virtual_free        vf         MEMORY      <=    YES         NO
> 0        0
> virtual_total       vt         MEMORY      <=    YES         NO
> 0        0
> virtual_used        vu         MEMORY      >=    YES         NO
> 0        0
> # >#< starts a comment but comments are not saved across edits --------
>
> thanks for your help.
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to