[OMPI devel] Proposed change to Cuda dependencies in Open MPI

Zhang, William via devel Fri, 09 Sep 2022 14:25:51 -0700

Hello interested parties,

As part of the work for the accelerator framework, the non standard behavior of 
the existing cuda code in Open MPI is being reworked. One of the proposed 
changes involves a change to the behavior of linking/compiling cuda components.


Currently, cuda functions are loaded dynamically using dlopen and stored in a 
function pointer table, with some code to search through typical paths to 
locate libcuda. This means that we can compile Open MPI 
–with-cuda=/path/to/cuda and the resulting build should work on both cuda and 
non cuda environments.

The change we are making involves removing the function pointer table and 
instead, having relevant components have a direct dependency on libcuda. This 
is in line with the rest of Open MPI’s MCA system where you can build 
components as dsos.

The difference here are: Open MPI will call libcuda functions directly and 
components that have a cuda dependency will be built as dso’s (ie. 
–with-cuda=/path/to/cuda/ –enable-mca-dso=accelerator-cuda). During linking, 
these dso’s may fail to load, such as on a non cuda environment, but this won’t 
prevent Open MPI from functioning. A related work - 
https://github.com/open-mpi/ompi/pull/10763 - to have an option to silence 
warnings that occur in this expected behavior path is also being worked on.

From a user behavior, nothing changes. From compilation, dependent components 
will need to be built as dso’s. From code, we can remove dlopen dependency for 
cuda builds, standardize the cuda code with the rest of Open MPI, and remove 
code involved with storing function pointers and detecting libcuda location.

Please provide feedback if you have any suggestions or are against these 
changes.

Thanks,
William Zhang

[OMPI devel] Proposed change to Cuda dependencies in Open MPI

Reply via email to