Hello, I recently got the first report of what we knew would happen one day or another: plugin namespace issues caused by somebody loading a plugin-enabled hwloc as a plugin. It comes from OpenCL (which uses plugins to select implementations) because one implementation depends on hwloc. What happens is that hwloc fails to load its plugins because they need some functions from the hwloc core, but they cannot find them because hwloc was loaded in a private namespace within a OpenCL plugin.
What's annoying is that the program completely seems to load plugins fine but later aborts at use-time because of the missing symbol (and there's no portable/easy way to force load-time lookup from what I see in the ltdl documentation). One easy workaround is to set HWLOC_PLUGINS_PATH=/none in the environment, so that no hwloc plugin is found. But this may remove some features. The proper fix for now is to rebuild hwloc without plugins. So we don't have to hurry and fix this for v1.7.2, but we can still look at it for v1.8. Two solutions were envisioned earlier: * Have hwloc plugins depend on libhwloc. Jeff didn't like it because it will cause multiple instances of libhwloc to be loaded, which will break if we have internal/global state in libhwloc. I think we actually have no such internal state, but this way may still be dangerous. * Have the core tell plugins where core symbols are. Basically means doing our own symbol lookup manually. Possible issues: + We have maaaaaaany symbols, it's not easy to define which ones are available to plugins and which ones are not. Quick look [1]. + Plugins won't be able to call hwloc functions directly anymore, and they won't be able to use inline helpers anymore (since those often call hwloc core functions explicitly). + Need to implement that without causing future ABI breaks when extending to API that is available to plugins. Maybe have plugins pass an array of strings listing which symbols they need. Other ideas? Brice [1] Review of public symbols: Things that shouldn't be available to plugins: * init/load/destroy * topology_set_*() topology_ignore_*() topology_restrict() * XML export/import * cpubind/membind/last_cpu_location (as well alloc/free) * custom_insert_* Things that should be available: * hwloc/plugins.h * other insert() functions (not sure) * most of our get() functions * most stringification functions * minor other things (about 30 total) hwloc/bitmap.h is the biggest problem, plugins should be allowed to use all of them but there are maaaaany of them. Splitting hwloc-bitmap.so out of hwloc.so would be an easy way to solve this. The bitmap API is totally independent from the hwloc core anyway. Brice