I have updated the site to reflect this discussion to-date. I'm still trying to figure out what to do about low-level libs. For now, I've removed the envars and modified suggestions.
https://openpmix.github.io/support/faq/avoid-hwloc-dup Further comment/input is welcome. > On Feb 3, 2021, at 8:09 AM, Ralph Castain via devel > <devel@lists.open-mpi.org> wrote: > > What if we do this: > > - if you are using PMIx v4.1 or above, then there is no problem. Call > PMIx_Load_topology and we will always return a valid pointer to the topology, > subject to the caveat that all members of the process (as well as the server) > must use the same hwloc version. > > - if you are using PMIx v4.0 or below, then first do a PMIx_Get for > PMIX_TOPOLOGY. If "not found", then try to get the shmem info and adopt it. > If the shmem info isn't found, then do a topology_load to discover the > topology. Either way, when done, do a PMIx_Store_internal of the > hwloc_topology_t using the PMIX_TOPOLOGY key. > > This still leaves open the question of what to do with low-level libraries > that really don't want to link against PMIx. I'm not sure what to do there. I > agree it is "ugly" to pass an addr in the environment, but there really isn't > any cleaner option that I can see short of asking every library to provide us > with the ability to pass hwloc_topology_t down to them. Outside of that > obvious answer, I suppose we could put the hwloc_topology_t address into the > environment and have them connect that way? > > >> On Feb 3, 2021, at 7:36 AM, Ralph Castain via devel >> <devel@lists.open-mpi.org> wrote: >> >> I guess this begs the question: how does a library detect that the shmem >> region has already been mapped? If we attempt to map it and fail, does that >> mean it has already been mapped or that it doesn't exist? >> >> It isn't reasonable to expect that all the libraries in a process will >> coordinate such that they "know" hwloc has been initialized by the main >> program, for example. So how do they determine that the topology is present, >> and how do they gain access to it? >> >> >>> On Feb 3, 2021, at 6:07 AM, Brice Goglin via devel >>> <devel@lists.open-mpi.org> wrote: >>> >>> Hello Ralph >>> >>> One thing that isn't clear in this document : the hwloc shmem region may >>> only be mapped *once* per process (because the mmap address is always >>> the same). Hence, if a library calls adopt() in the process, others will >>> fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC >>> topology tree from clients". >>> >>> For the 3rd case where low-level libraries don't want to depend on PMIx, >>> storing the pointer to the topology in an environment variable might be >>> a (ugly) solution. >>> >>> By the way, you may want to specify somewhere that all these libraries >>> using the topology pointer in the process must use the same hwloc >>> version (e.g. not 2.0 vs 2.4). shmem_adopt() verifies that the exported >>> and importer are compatible. But passing the topology pointer doesn't >>> provide any way to verify that the caller doesn't use its own >>> incompatible embedded hwloc. >>> >>> Brice >>> >>> >>> Le 02/02/2021 à 18:32, Ralph Castain via devel a écrit : >>>> Hi folks >>>> >>>> Per today's telecon, here is a link to a description of the HWLOC >>>> duplication issue for many-core environments and methods by which you >>>> can mitigate the impact. >>>> >>>> https://openpmix.github.io/support/faq/avoid-hwloc-dup >>>> <https://openpmix.github.io/support/faq/avoid-hwloc-dup> >>>> >>>> George: for lower-level libs like treematch or HAN, you might want to >>>> look at the envar method (described about half-way down the page) to >>>> avoid directly linking those libraries against PMIx. That wouldn't be >>>> a problem while inside OMPI, but could be an issue if people want to >>>> use them in a non-PMIx environment. >>>> >>>> Ralph >>>> >>> >> >> > >