On Fri, Feb 17, 2012 at 8:47 AM, Brice Goglin <brice.gog...@inria.fr> wrote:
> Le 17/02/2012 14:59, Jeff Squyres a écrit : > > On Feb 17, 2012, at 8:21 AM, Ralph Castain wrote: > > > >>> I didn't follow this entire thread in details, but I am feeling that > something is wrong here. The flag fixes your problem indeed, but I think it > may break binding too. It's basically making all "unavailable resources" > available. So the binding code may end up trying to bind processes on cores > that it can't actually use. > > I'm not sure I follow here -- which binding code are you referring to; > that in hwloc, or that in OMPI? > > My understanding of what we should be doing is to compare the output > bitmask from hwloc_get_cpubind() with the allowed_cpuset on the > HWLOC_OBJ_MACHINE. If where we are bound is less than the allowed cpuset, > then the process is bound. > > > > Is that correct? > > Yes. > I didn't know you already used allowed_cpuset instead of cpuset, good to > know. > > > And per Ralph's question, the allowed_cpuset of HWLOC_OBJ_MACHINE will > still be accurate even if we do WHOLE_SYSTEM, right? > > Yes. > > > E.g., if some external agent creates a Linux cpuset for a process, > then even if we specify WHOLE_SYSTEM, the allowed_cpuset on OBJ_MACHINE > will still accurately reflect the PU's are in the Linux cpuset where this > process is running. > > Yes. > > > But you're talking about "am I bound?" here. My concern was "how does > OMPI bind processes?". > If WHOLE_SYSTEM is passed, you may get more objects in your topology > (most objects with allowed_cpuset=0 are removed when the flag is not > set). So things like get_nbobjs_by_type() return larger values when you > pass the flag. So you can't rely of those values when distributing the > processes among the available cores for instance. Does the OMPI binding > code handle this? > Yes, we do - because we also allow a user to specify a restricted cpuset for us to use, I automatically filter all cpusets at the beginning of time to create an "available" set for our internal use. This is the set I scan when looking at the number of objects available to us. Of course, if a developer doesn't use our internal utilities to get those numbers, they could do something wrong. :-) All that said, I think using the WHOLE_SYSTEM flag is actually incorrect. We don't need to do that as the problem Nadia identified is better solved by correcting the current logic. I'm working on that now - unfortunately, the only slurm machine I can access doesn't have slurm's affinity module activated. > > Brice > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >