What if we do this:

- if you are using PMIx v4.1 or above, then there is no problem. Call 
PMIx_Load_topology and we will always return a valid pointer to the topology, 
subject to the caveat that all members of the process (as well as the server) 
must use the same hwloc version.

- if you are using PMIx v4.0 or below, then first do a PMIx_Get for 
PMIX_TOPOLOGY. If "not found", then try to get the shmem info and adopt it. If 
the shmem info isn't found, then do a topology_load to discover the topology. 
Either way, when done, do a PMIx_Store_internal of the hwloc_topology_t using 
the PMIX_TOPOLOGY key.

This still leaves open the question of what to do with low-level libraries that 
really don't want to link against PMIx. I'm not sure what to do there. I agree 
it is "ugly" to pass an addr in the environment, but there really isn't any 
cleaner option that I can see short of asking every library to provide us with 
the ability to pass hwloc_topology_t down to them. Outside of that obvious 
answer, I suppose we could put the hwloc_topology_t address into the 
environment and have them connect that way?


> On Feb 3, 2021, at 7:36 AM, Ralph Castain via devel 
> <devel@lists.open-mpi.org> wrote:
> 
> I guess this begs the question: how does a library detect that the shmem 
> region has already been mapped? If we attempt to map it and fail, does that 
> mean it has already been mapped or that it doesn't exist?
> 
> It isn't reasonable to expect that all the libraries in a process will 
> coordinate such that they "know" hwloc has been initialized by the main 
> program, for example. So how do they determine that the topology is present, 
> and how do they gain access to it?
> 
> 
>> On Feb 3, 2021, at 6:07 AM, Brice Goglin via devel 
>> <devel@lists.open-mpi.org> wrote:
>> 
>> Hello Ralph
>> 
>> One thing that isn't clear in this document : the hwloc shmem region may
>> only be mapped *once* per process (because the mmap address is always
>> the same). Hence, if a library calls adopt() in the process, others will
>> fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC
>> topology tree from clients".
>> 
>> For the 3rd case where low-level libraries don't want to depend on PMIx,
>> storing the pointer to the topology in an environment variable might be
>> a (ugly) solution.
>> 
>> By the way, you may want to specify somewhere that all these libraries
>> using the topology pointer in the process must use the same hwloc
>> version (e.g. not 2.0 vs 2.4). shmem_adopt() verifies that the exported
>> and importer are compatible. But passing the topology pointer doesn't
>> provide any way to verify that the caller doesn't use its own
>> incompatible embedded hwloc.
>> 
>> Brice
>> 
>> 
>> Le 02/02/2021 à 18:32, Ralph Castain via devel a écrit :
>>> Hi folks
>>> 
>>> Per today's telecon, here is a link to a description of the HWLOC
>>> duplication issue for many-core environments and methods by which you
>>> can mitigate the impact.
>>> 
>>> https://openpmix.github.io/support/faq/avoid-hwloc-dup
>>> <https://openpmix.github.io/support/faq/avoid-hwloc-dup>
>>> 
>>> George: for lower-level libs like treematch or HAN, you might want to
>>> look at the envar method (described about half-way down the page) to
>>> avoid directly linking those libraries against PMIx. That wouldn't be
>>> a problem while inside OMPI, but could be an issue if people want to
>>> use them in a non-PMIx environment.
>>> 
>>> Ralph
>>> 
>> 
> 
> 


Reply via email to