I should add: this does beg the question of how a proc “discovers” its resource constraints without having access to the hwloc tree. One possible solution - the RM already knows the restrictions, and so it could pass those down at proc startup (e.g., as part of the PMIx info). We could pass whatever info hwloc would like passed into its calls - doesn’t have to be something “understandable” by the proc itself.
> On Oct 21, 2016, at 8:15 AM, r...@open-mpi.org wrote: > > Hmmm...I think maybe we are only seeing a small portion of the picture here. > There are two pieces of the problem when looking at large SMPs: > > * time required for discovery - your proposal is attempting to address that, > assuming that the RM daemon collects the topology and then communicates it to > each process (which is today’s method) > > * memory footprint. We are seeing over 40MBytes being consumed by hwloc > topologies on fully loaded KNL machines, which is a disturbing number > > Where we are headed is to having only one copy of the hwloc topology tree on > a node, stored in a shared memory segment hosted by the local RM daemon. > Procs will then access that tree to obtain any required info. Thus, we are > less interested in each process creating its own tree based on an XML > representation passed to it by the RM, and more interested in having the > hwloc search algorithms correctly handle any resource restrictions when > searching the RM’s tree. > > In other words, rather than (or perhaps, in addition to?) filtering the XML, > we’d prefer to see some modification of the search APIs to allow a proc to > pass in its resource constraints, and have the search algorithm properly > consider them when returning the result. This eliminates all the XML > conversion overhead, and resolves the memory footprint issue. > > HTH > Ralph > >> On Oct 21, 2016, at 5:16 AM, Brice Goglin <brice.gog...@inria.fr> wrote: >> >> Hello >> >> Based on recent discussion about hwloc_topology_load() being slow on >> some "large" platforms (almost 1 second on KNL), here's a new feature >> proposal: >> >> We've been recommending the use of XML to avoid multiple expensive >> discovery: Export to XML once at boot, and reload from XML for each >> actual process using hwloc. The main limitation is cgroups: resource >> managers use cgroups to restrict the processors and memory that are >> actually available to each job. So the topology of different jobs on the >> same machine is actually slightly different from the main XML that >> contained everything when it was created outside of cgroups during boot. >> >> So we're looking at adding a new topology flag that loads the entire >> machine from XML (or synthetic) and applies restrictions from the >> local/native operating system. >> >> Details at https://github.com/open-mpi/hwloc/pull/212 >> Comments welcome here or there. >> >> Brice >> >> _______________________________________________ >> hwloc-devel mailing list >> hwloc-devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel > > _______________________________________________ > hwloc-devel mailing list > hwloc-devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel _______________________________________________ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-devel