The text looks correct to me. I don't have any better suggestion for now.

I am thinking about adding a adopt() flag to say "adopt it, or give me a
pointer to the already adopted one", but it's not clear to me how to
implement this safely. I opened a hwloc issues to discuss the details of
making sure both adopt() calls point to the very same shmem topology
file https://github.com/open-mpi/hwloc/issues/449

Brice


Le 04/02/2021 à 01:28, Ralph Castain via devel a écrit :
> I have updated the site to reflect this discussion to-date. I'm still trying 
> to figure out what to do about low-level libs. For now, I've removed the 
> envars and modified suggestions.
>
> https://openpmix.github.io/support/faq/avoid-hwloc-dup
>
> Further comment/input is welcome.
>
>
>> On Feb 3, 2021, at 8:09 AM, Ralph Castain via devel 
>> <devel@lists.open-mpi.org> wrote:
>>
>> What if we do this:
>>
>> - if you are using PMIx v4.1 or above, then there is no problem. Call 
>> PMIx_Load_topology and we will always return a valid pointer to the 
>> topology, subject to the caveat that all members of the process (as well as 
>> the server) must use the same hwloc version.
>>
>> - if you are using PMIx v4.0 or below, then first do a PMIx_Get for 
>> PMIX_TOPOLOGY. If "not found", then try to get the shmem info and adopt it. 
>> If the shmem info isn't found, then do a topology_load to discover the 
>> topology. Either way, when done, do a PMIx_Store_internal of the 
>> hwloc_topology_t using the PMIX_TOPOLOGY key.
>>
>> This still leaves open the question of what to do with low-level libraries 
>> that really don't want to link against PMIx. I'm not sure what to do there. 
>> I agree it is "ugly" to pass an addr in the environment, but there really 
>> isn't any cleaner option that I can see short of asking every library to 
>> provide us with the ability to pass hwloc_topology_t down to them. Outside 
>> of that obvious answer, I suppose we could put the hwloc_topology_t address 
>> into the environment and have them connect that way?
>>
>>
>>> On Feb 3, 2021, at 7:36 AM, Ralph Castain via devel 
>>> <devel@lists.open-mpi.org> wrote:
>>>
>>> I guess this begs the question: how does a library detect that the shmem 
>>> region has already been mapped? If we attempt to map it and fail, does that 
>>> mean it has already been mapped or that it doesn't exist?
>>>
>>> It isn't reasonable to expect that all the libraries in a process will 
>>> coordinate such that they "know" hwloc has been initialized by the main 
>>> program, for example. So how do they determine that the topology is 
>>> present, and how do they gain access to it?
>>>
>>>
>>>> On Feb 3, 2021, at 6:07 AM, Brice Goglin via devel 
>>>> <devel@lists.open-mpi.org> wrote:
>>>>
>>>> Hello Ralph
>>>>
>>>> One thing that isn't clear in this document : the hwloc shmem region may
>>>> only be mapped *once* per process (because the mmap address is always
>>>> the same). Hence, if a library calls adopt() in the process, others will
>>>> fail. This applies to the 2nd and 3rd case in "Accessing the HWLOC
>>>> topology tree from clients".
>>>>
>>>> For the 3rd case where low-level libraries don't want to depend on PMIx,
>>>> storing the pointer to the topology in an environment variable might be
>>>> a (ugly) solution.
>>>>
>>>> By the way, you may want to specify somewhere that all these libraries
>>>> using the topology pointer in the process must use the same hwloc
>>>> version (e.g. not 2.0 vs 2.4). shmem_adopt() verifies that the exported
>>>> and importer are compatible. But passing the topology pointer doesn't
>>>> provide any way to verify that the caller doesn't use its own
>>>> incompatible embedded hwloc.
>>>>
>>>> Brice
>>>>
>>>>
>>>> Le 02/02/2021 à 18:32, Ralph Castain via devel a écrit :
>>>>> Hi folks
>>>>>
>>>>> Per today's telecon, here is a link to a description of the HWLOC
>>>>> duplication issue for many-core environments and methods by which you
>>>>> can mitigate the impact.
>>>>>
>>>>> https://openpmix.github.io/support/faq/avoid-hwloc-dup
>>>>> <https://openpmix.github.io/support/faq/avoid-hwloc-dup>
>>>>>
>>>>> George: for lower-level libs like treematch or HAN, you might want to
>>>>> look at the envar method (described about half-way down the page) to
>>>>> avoid directly linking those libraries against PMIx. That wouldn't be
>>>>> a problem while inside OMPI, but could be an issue if people want to
>>>>> use them in a non-PMIx environment.
>>>>>
>>>>> Ralph
>>>>>
>>>
>>
>

Reply via email to