devel-boun...@open-mpi.org wrote on 02/17/2012 08:36:54 AM:

> De : Brice Goglin <brice.gog...@inria.fr>
> A : de...@open-mpi.org
> Date : 02/17/2012 08:37 AM
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> processes as bound if the job has been launched by srun
> Envoyé par : devel-boun...@open-mpi.org
> 
> Le 16/02/2012 14:16, nadia.der...@bull.net a écrit : 
> Hi Jeff, 
> 
> Sorry for the delay, but my victim with 2 ib devices had been stolen ;-) 

> 
> So, I ported the patch on the v1.5 branch and finally could test it. 
> 
> Actually, there is no opal_hwloc_base_get_topology() in v1.5 so I had to 
set 
> the hwloc flags in ompi_mpi_init() and orte_odls_base_open() (i.e. the 
places
> where opal_hwloc_topology is initialized). 
> 
> With the new flag set, hwloc_get_nbobjs_by_type(opal_hwloc_topology,
> HWLOC_OBJ_CORE) 
> is now seeing the actual number of cores on the node (instead of 1 when 
our 
> cpuset is a singleton). 
> 
> Since opal_paffinity_base_get_processor_info() calls 
> module_get_processor_info() 
> (in hwloc/paffinity_hwloc_module.c), which in turn calls 
> hwloc_get_nbobjs_by_type(), 
> we are now getting the right number of cores in get_ib_dev_distance(). 
> 
> So we are looping over the exact number of cores, looking for a 
> potential binding. 
> 
> So as a conclusion, there's no need for any other patch: the fix 
youcommitted
> was the only one needed to fix the issue. 
> 
> I didn't follow this entire thread in details, but I am feeling that
> something is wrong here. The flag fixes your problem indeed, but I 
> think it may break binding too. It's basically making all 
> "unavailable resources" available. So the binding code may end up 
> trying to bind processes on cores that it can't actually use.

It's true that if we have a resource manager that can allocate for us
say a single socket within a node, the binding part OMPI might go out
of its actual boundaries.

> 
> If srun gives you the first cores of the machine, it works fine 
> because OMPI tries to use the first cores and those are available. 
> But did you ever try when srun gives the second socket only for 
> instance? Or whichever part of the machine that does not contain the
> first cores ?

But I have to look for the proper option in slurm: I don't know if slurm 
allows for such a fine grained allocation. I have to look for the option
that enables to allocate socket X (X!=0).

> I think OMPI will still try to bind on the first cores
> if the flag is set, but those are not available for binding.
> 
> Unless I am missing something, the proper fix would be to have two 
> instances of the topology. One with the entire machine (for people 
> that really want to consult all physical resources), and one for the
> really available part of machine (mostly used for binding).

Agreed! 

Regards,
Nadia
> 
> Brice
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to