You might want to take a look at PRRTE (https://github.com/openpmix/prrte) - it 
does exactly what you describe., only from the other way around. It provides a 
customizable launcher that supports the various cmd lines, and then uses a 
common RTE backend.

We don't use SLURM_TASKS_PER_NODE for placement - we only use it to determine 
how many processes we are allowed to run on that node. This is particularly 
important in multi-tenant environments. As you have discovered, we need it, so 
removing it isn't an option.


On Mar 21, 2021, at 12:22 PM, Martyn Foster via devel <devel@lists.open-mpi.org 
<mailto:devel@lists.open-mpi.org> > wrote:

Sorry for the slow reply! 

I didn't want to get fixated on why the variable was unset, though I can 
understand the existence of a check if Slurm always sets this (I don't recall 
that being the case for all configurations historically, but perhaps it is 
now). The reason I'd unset it (!) is because I was trying to build an 
environment to support completely arbitrary task placement/distribution that 
works with various launchers (orterun/srun/hydra) and it seems tasks_per_node 
being set was upsetting one of the others. 

Slurm's internal geometry parameters can't possibly describe an arbitrary 
(rankfile) layout, so I was nervous about why they would be required if a 
rankfile was provided...

Martyn

On Mon, 15 Mar 2021 at 19:57, Ralph Castain via devel <devel@lists.open-mpi.org 
<mailto:devel@lists.open-mpi.org> > wrote:
Martyn? Why are you saying SLURM_TASKS_PER_NODE might not be present?

It sounds to me like something is wrong in your Slurm environment - I really 
believe that this envar is always supposed to be there.


> On Mar 15, 2021, at 4:20 AM, Peter Kjellström <c...@nsc.liu.se 
> <mailto:c...@nsc.liu.se> > wrote:
> 
> On Fri, 12 Mar 2021 22:19:09 +0000
> Ralph Castain via devel <devel@lists.open-mpi.org 
> <mailto:devel@lists.open-mpi.org> > wrote:
> 
>> Why would it not be set? AFAICT, Slurm is supposed to always set that
>> envar, or so we've been told.
> 
> Maybe confusion on the exact name?
> 
> AFAIK slurm always sets SLURM_TASKS_PER_NODE but only sets
> SLURM_NTASKS_PER_NODE (almost same name) when --ntasks-per-node is
> given.
> 
> /Peter K



Reply via email to