Zha0q1 opened a new issue #20145:
URL: https://github.com/apache/incubator-mxnet/issues/20145
The nightly docker
public.ecr.aws/w6z5f7h2/mxnet/python:nightly_gpu_cu112_py3 has a undefined
symbol issue
```
>>> import mxnet
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/dist-packages/mxnet/__init__.py", line 23,
in <module>
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/usr/local/lib/python3.7/dist-packages/mxnet/context.py", line 20,
in <module>
from .base import _LIB
File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 293, in
<module>
_LIB = _load_lib()
File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 284, in
_load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/usr/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.7/dist-packages/mxnet/libmxnet.so: undefined
symbol: nvmlDeviceGetComputeRunningProcesses_v2
```
This is most likely due to that in new nvml (cu112) there is a new v2 api.
checking `nvidia/cuda:11.2.0-cudnn8-devel-centos7` confirmed this:
```
* NVML API versioning support
*/
#define NVML_API_VERSION 11
#define NVML_API_VERSION_STR "11"
/**
* Defining NVML_NO_UNVERSIONED_FUNC_DEFS will disable "auto upgrading" of
APIs.
* e.g. the user will have to call nvmlInit_v2 instead of nvmlInit. Enable
this
* guard if you need to support older versions of the API
*/
#ifndef NVML_NO_UNVERSIONED_FUNC_DEFS
#define nvmlInit nvmlInit_v2
#define nvmlDeviceGetPciInfo nvmlDeviceGetPciInfo_v3
#define nvmlDeviceGetCount nvmlDeviceGetCount_v2
#define nvmlDeviceGetHandleByIndex
nvmlDeviceGetHandleByIndex_v2
#define nvmlDeviceGetHandleByPciBusId
nvmlDeviceGetHandleByPciBusId_v2
#define nvmlDeviceGetNvLinkRemotePciInfo
nvmlDeviceGetNvLinkRemotePciInfo_v2
#define nvmlDeviceRemoveGpu nvmlDeviceRemoveGpu_v2
#define nvmlDeviceGetGridLicensableFeatures
nvmlDeviceGetGridLicensableFeatures_v3
#define nvmlEventSetWait nvmlEventSetWait_v2
#define nvmlDeviceGetAttributes
nvmlDeviceGetAttributes_v2
#define nvmlComputeInstanceGetInfo
nvmlComputeInstanceGetInfo_v2
#define nvmlDeviceGetComputeRunningProcesses
nvmlDeviceGetComputeRunningProcesses_v2
#define nvmlDeviceGetGraphicsRunningProcesses
nvmlDeviceGetGraphicsRunningProcesses_v2
#endif // #ifndef NVML_NO_UNVERSIONED_FUNC_DEFS
..........
..........
nvmlReturn_t DECLDIR nvmlDeviceGetComputeRunningProcesses_v2(nvmlDevice_t
device, unsigned int *infoCount, nvmlProcessInfo_t *infos);
```
We can probably get around this issue by defining
`NVML_NO_UNVERSIONED_FUNC_DEFS`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]