What if we do this: - if you are using PMIx v4.1 or above, then there is no problem. Call PMIx_Load_topology and we will always return a valid pointer to the topology, subject to the caveat that all members of the process (as well as the server) must use the same hwloc version.
- if you are using PMIx v4.0 or below, then first do a PMIx_Get for PMIX_TOPOLOGY. If "not found", then try to get the shmem info and adopt it. If the shmem info isn't found, then do a topology_load to discover the topology. Either way, when done, do a PMIx_Store_internal of the hwloc_topology_t using the PMIX_TOPOLOGY key. This still leaves open the question of what to do with low-level libraries that really don't want to link against PMIx. I'm not sure what to do there. I agree it is "ugly" to pass an addr in the environment, but there really isn't any cleaner option that I can see short of asking every library to provide us with the ability to pass hwloc_topology_t down to them. Outside of that obvious answer, I suppose we could put the hwloc_topology_t address into the environment and have them connect that way? > On Feb 3, 2021, at 7:36 AM, Ralph Castain via devel > <devel@lists.open-mpi.org> wrote: > > I guess this begs the question: how does a library detect that the shmem > region has already been mapped? If we attempt to map it and fail, does that > mean it has already been mapped or that it doesn't exist? > > It isn't reasonable to expect that all the libraries in a process will > coordinate such that they "know" hwloc has been initialized by the main > program, for example. So how do they determine that the topology is present, > and how do they gain access to it? > > >> On Feb 3, 2021, at 6:07 AM, Brice Goglin via devel >> <devel@lists.open-mpi.org> wrote: >> >> Hello Ralph >> >> One thing that isn't clear in this document : the hwloc shmem region may >> only be mapped *once* per process (because the mmap address is always >> the same). Hence, if a library calls adopt() in the process, others will >> fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC >> topology tree from clients". >> >> For the 3rd case where low-level libraries don't want to depend on PMIx, >> storing the pointer to the topology in an environment variable might be >> a (ugly) solution. >> >> By the way, you may want to specify somewhere that all these libraries >> using the topology pointer in the process must use the same hwloc >> version (e.g. not 2.0 vs 2.4). shmem_adopt() verifies that the exported >> and importer are compatible. But passing the topology pointer doesn't >> provide any way to verify that the caller doesn't use its own >> incompatible embedded hwloc. >> >> Brice >> >> >> Le 02/02/2021 à 18:32, Ralph Castain via devel a écrit : >>> Hi folks >>> >>> Per today's telecon, here is a link to a description of the HWLOC >>> duplication issue for many-core environments and methods by which you >>> can mitigate the impact. >>> >>> https://openpmix.github.io/support/faq/avoid-hwloc-dup >>> <https://openpmix.github.io/support/faq/avoid-hwloc-dup> >>> >>> George: for lower-level libs like treematch or HAN, you might want to >>> look at the envar method (described about half-way down the page) to >>> avoid directly linking those libraries against PMIx. That wouldn't be >>> a problem while inside OMPI, but could be an issue if people want to >>> use them in a non-PMIx environment. >>> >>> Ralph >>> >> > >