Hello,

I recently got the first report of what we knew would happen one day or
another: plugin namespace issues caused by somebody loading a
plugin-enabled hwloc as a plugin. It comes from OpenCL (which uses
plugins to select implementations) because one implementation depends on
hwloc. What happens is that hwloc fails to load its plugins because they
need some functions from the hwloc core, but they cannot find them
because hwloc was loaded in a private namespace within a OpenCL plugin.

What's annoying is that the program completely seems to load plugins
fine but later aborts at use-time because of the missing symbol (and
there's no portable/easy way to force load-time lookup from what I see
in the ltdl documentation).

One easy workaround is to set HWLOC_PLUGINS_PATH=/none in the
environment, so that no hwloc plugin is found. But this may remove some
features.

The proper fix for now is to rebuild hwloc without plugins. So we don't
have to hurry and fix this for v1.7.2, but we can still look at it for v1.8.

Two solutions were envisioned earlier:
* Have hwloc plugins depend on libhwloc. Jeff didn't like it because it
will cause multiple instances of libhwloc to be loaded, which will break
if we have internal/global state in libhwloc. I think we actually have
no such internal state, but this way may still be dangerous.
* Have the core tell plugins where core symbols are. Basically means
doing our own symbol lookup manually. Possible issues:
  + We have maaaaaaany symbols, it's not easy to define which ones are
available to plugins and which ones are not. Quick look [1].
  + Plugins won't be able to call hwloc functions directly anymore, and
they won't be able to use inline helpers anymore (since those often call
hwloc core functions explicitly).
  + Need to implement that without causing future ABI breaks when
extending to API that is available to plugins. Maybe have plugins pass
an array of strings listing which symbols they need.

Other ideas?

Brice


[1] Review of public symbols:

Things that shouldn't be available to plugins:
* init/load/destroy
* topology_set_*() topology_ignore_*() topology_restrict()
* XML export/import
* cpubind/membind/last_cpu_location (as well alloc/free)
* custom_insert_*

Things that should be available:
* hwloc/plugins.h
* other insert() functions (not sure)
* most of our get() functions
* most stringification functions
* minor other things
(about 30 total)

hwloc/bitmap.h is the biggest problem, plugins should be allowed to use
all of them but there are maaaaany of them. Splitting hwloc-bitmap.so
out of hwloc.so would be an easy way to solve this. The bitmap API is
totally independent from the hwloc core anyway.

Brice

Reply via email to