devel-boun...@open-mpi.org wrote on 02/17/2012 08:36:54 AM: > De : Brice Goglin <brice.gog...@inria.fr> > A : de...@open-mpi.org > Date : 02/17/2012 08:37 AM > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@open-mpi.org > > Le 16/02/2012 14:16, nadia.der...@bull.net a écrit : > Hi Jeff, > > Sorry for the delay, but my victim with 2 ib devices had been stolen ;-)
> > So, I ported the patch on the v1.5 branch and finally could test it. > > Actually, there is no opal_hwloc_base_get_topology() in v1.5 so I had to set > the hwloc flags in ompi_mpi_init() and orte_odls_base_open() (i.e. the places > where opal_hwloc_topology is initialized). > > With the new flag set, hwloc_get_nbobjs_by_type(opal_hwloc_topology, > HWLOC_OBJ_CORE) > is now seeing the actual number of cores on the node (instead of 1 when our > cpuset is a singleton). > > Since opal_paffinity_base_get_processor_info() calls > module_get_processor_info() > (in hwloc/paffinity_hwloc_module.c), which in turn calls > hwloc_get_nbobjs_by_type(), > we are now getting the right number of cores in get_ib_dev_distance(). > > So we are looping over the exact number of cores, looking for a > potential binding. > > So as a conclusion, there's no need for any other patch: the fix youcommitted > was the only one needed to fix the issue. > > I didn't follow this entire thread in details, but I am feeling that > something is wrong here. The flag fixes your problem indeed, but I > think it may break binding too. It's basically making all > "unavailable resources" available. So the binding code may end up > trying to bind processes on cores that it can't actually use. It's true that if we have a resource manager that can allocate for us say a single socket within a node, the binding part OMPI might go out of its actual boundaries. > > If srun gives you the first cores of the machine, it works fine > because OMPI tries to use the first cores and those are available. > But did you ever try when srun gives the second socket only for > instance? Or whichever part of the machine that does not contain the > first cores ? But I have to look for the proper option in slurm: I don't know if slurm allows for such a fine grained allocation. I have to look for the option that enables to allocate socket X (X!=0). > I think OMPI will still try to bind on the first cores > if the flag is set, but those are not available for binding. > > Unless I am missing something, the proper fix would be to have two > instances of the topology. One with the entire machine (for people > that really want to consult all physical resources), and one for the > really available part of machine (mostly used for binding). Agreed! Regards, Nadia > > Brice > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel