I have updated the site to reflect this discussion to-date. I'm still trying to 
figure out what to do about low-level libs. For now, I've removed the envars 
and modified suggestions.

https://openpmix.github.io/support/faq/avoid-hwloc-dup

Further comment/input is welcome.


> On Feb 3, 2021, at 8:09 AM, Ralph Castain via devel 
> <devel@lists.open-mpi.org> wrote:
> 
> What if we do this:
> 
> - if you are using PMIx v4.1 or above, then there is no problem. Call 
> PMIx_Load_topology and we will always return a valid pointer to the topology, 
> subject to the caveat that all members of the process (as well as the server) 
> must use the same hwloc version.
> 
> - if you are using PMIx v4.0 or below, then first do a PMIx_Get for 
> PMIX_TOPOLOGY. If "not found", then try to get the shmem info and adopt it. 
> If the shmem info isn't found, then do a topology_load to discover the 
> topology. Either way, when done, do a PMIx_Store_internal of the 
> hwloc_topology_t using the PMIX_TOPOLOGY key.
> 
> This still leaves open the question of what to do with low-level libraries 
> that really don't want to link against PMIx. I'm not sure what to do there. I 
> agree it is "ugly" to pass an addr in the environment, but there really isn't 
> any cleaner option that I can see short of asking every library to provide us 
> with the ability to pass hwloc_topology_t down to them. Outside of that 
> obvious answer, I suppose we could put the hwloc_topology_t address into the 
> environment and have them connect that way?
> 
> 
>> On Feb 3, 2021, at 7:36 AM, Ralph Castain via devel 
>> <devel@lists.open-mpi.org> wrote:
>> 
>> I guess this begs the question: how does a library detect that the shmem 
>> region has already been mapped? If we attempt to map it and fail, does that 
>> mean it has already been mapped or that it doesn't exist?
>> 
>> It isn't reasonable to expect that all the libraries in a process will 
>> coordinate such that they "know" hwloc has been initialized by the main 
>> program, for example. So how do they determine that the topology is present, 
>> and how do they gain access to it?
>> 
>> 
>>> On Feb 3, 2021, at 6:07 AM, Brice Goglin via devel 
>>> <devel@lists.open-mpi.org> wrote:
>>> 
>>> Hello Ralph
>>> 
>>> One thing that isn't clear in this document : the hwloc shmem region may
>>> only be mapped *once* per process (because the mmap address is always
>>> the same). Hence, if a library calls adopt() in the process, others will
>>> fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC
>>> topology tree from clients".
>>> 
>>> For the 3rd case where low-level libraries don't want to depend on PMIx,
>>> storing the pointer to the topology in an environment variable might be
>>> a (ugly) solution.
>>> 
>>> By the way, you may want to specify somewhere that all these libraries
>>> using the topology pointer in the process must use the same hwloc
>>> version (e.g. not 2.0 vs 2.4). shmem_adopt() verifies that the exported
>>> and importer are compatible. But passing the topology pointer doesn't
>>> provide any way to verify that the caller doesn't use its own
>>> incompatible embedded hwloc.
>>> 
>>> Brice
>>> 
>>> 
>>> Le 02/02/2021 à 18:32, Ralph Castain via devel a écrit :
>>>> Hi folks
>>>> 
>>>> Per today's telecon, here is a link to a description of the HWLOC
>>>> duplication issue for many-core environments and methods by which you
>>>> can mitigate the impact.
>>>> 
>>>> https://openpmix.github.io/support/faq/avoid-hwloc-dup
>>>> <https://openpmix.github.io/support/faq/avoid-hwloc-dup>
>>>> 
>>>> George: for lower-level libs like treematch or HAN, you might want to
>>>> look at the envar method (described about half-way down the page) to
>>>> avoid directly linking those libraries against PMIx. That wouldn't be
>>>> a problem while inside OMPI, but could be an issue if people want to
>>>> use them in a non-PMIx environment.
>>>> 
>>>> Ralph
>>>> 
>>> 
>> 
>> 
> 
> 


Reply via email to