Zha0q1 opened a new issue #20145:
URL: https://github.com/apache/incubator-mxnet/issues/20145


   The nightly docker 
public.ecr.aws/w6z5f7h2/mxnet/python:nightly_gpu_cu112_py3 has a undefined 
symbol issue
   ```
   >>> import mxnet
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/local/lib/python3.7/dist-packages/mxnet/__init__.py", line 23, 
in <module>
       from .context import Context, current_context, cpu, gpu, cpu_pinned
     File "/usr/local/lib/python3.7/dist-packages/mxnet/context.py", line 20, 
in <module>
       from .base import _LIB
     File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 293, in 
<module>
       _LIB = _load_lib()
     File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 284, in 
_load_lib
       lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
     File "/usr/lib/python3.7/ctypes/__init__.py", line 364, in __init__
       self._handle = _dlopen(self._name, mode)
   OSError: /usr/local/lib/python3.7/dist-packages/mxnet/libmxnet.so: undefined 
symbol: nvmlDeviceGetComputeRunningProcesses_v2
   ```
   This is most likely due to that in new nvml (cu112) there is a new v2 api.
   checking `nvidia/cuda:11.2.0-cudnn8-devel-centos7` confirmed this:
   ```
    * NVML API versioning support
    */
   #define NVML_API_VERSION            11
   #define NVML_API_VERSION_STR        "11"
   /**
    * Defining NVML_NO_UNVERSIONED_FUNC_DEFS will disable "auto upgrading" of 
APIs.
    * e.g. the user will have to call nvmlInit_v2 instead of nvmlInit. Enable 
this
    * guard if you need to support older versions of the API
    */
   #ifndef NVML_NO_UNVERSIONED_FUNC_DEFS
       #define nvmlInit                                nvmlInit_v2
       #define nvmlDeviceGetPciInfo                    nvmlDeviceGetPciInfo_v3
       #define nvmlDeviceGetCount                      nvmlDeviceGetCount_v2
       #define nvmlDeviceGetHandleByIndex              
nvmlDeviceGetHandleByIndex_v2
       #define nvmlDeviceGetHandleByPciBusId           
nvmlDeviceGetHandleByPciBusId_v2
       #define nvmlDeviceGetNvLinkRemotePciInfo        
nvmlDeviceGetNvLinkRemotePciInfo_v2
       #define nvmlDeviceRemoveGpu                     nvmlDeviceRemoveGpu_v2
       #define nvmlDeviceGetGridLicensableFeatures     
nvmlDeviceGetGridLicensableFeatures_v3
       #define nvmlEventSetWait                        nvmlEventSetWait_v2
       #define nvmlDeviceGetAttributes                 
nvmlDeviceGetAttributes_v2
       #define nvmlComputeInstanceGetInfo              
nvmlComputeInstanceGetInfo_v2
       #define nvmlDeviceGetComputeRunningProcesses    
nvmlDeviceGetComputeRunningProcesses_v2
       #define nvmlDeviceGetGraphicsRunningProcesses   
nvmlDeviceGetGraphicsRunningProcesses_v2
   #endif // #ifndef NVML_NO_UNVERSIONED_FUNC_DEFS
   ..........
   ..........
   nvmlReturn_t DECLDIR nvmlDeviceGetComputeRunningProcesses_v2(nvmlDevice_t 
device, unsigned int *infoCount, nvmlProcessInfo_t *infos);
   
   ```
   We can probably get around this issue by defining 
`NVML_NO_UNVERSIONED_FUNC_DEFS`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to