[jira] [Commented] (MESOS-6383) NvidiaGpuAllocator::resources cannot load symbol nvmlGetDeviceMinorNumber - can the device minor number be ascertained reliably using an older set of API calls?

Dylan Bethune-Waddell (JIRA) Sat, 22 Oct 2016 14:01:23 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15598490#comment-15598490
 ]


Dylan Bethune-Waddell commented on MESOS-6383:
----------------------------------------------

Hi Kevin,

First of all, I did just hit this one first. I will cross reference the NVML 
changelog and the Mesos code for additional symbols that might need a 
redundancy when I get a chance.

Second, I am not sure that {{nvidia-smi}} even tries to get the minor number in 
earlier versions, as from what I've read it essentially wraps the NVML library. 
The man page on our cluster for driver version 319.72 / CUDA 5.5 does not have 
the "GPU Attributes -> Minor Number" entry that is on the manpage for [later 
versions of the 
driver|http://developer.download.nvidia.com/compute/cuda/6_0/rel/gdk/nvidia-smi.331.38.pdf].
 Might not be able to use the cuda runtime either, as CUDA and NVML can 
enumerate the device IDs differently according to 
[{{nvidia-smi}}|https://github.com/al42and/cuda-smi] - that page also offers an 
anecdote about CUDA 7.0 onwards including the ability to set 
CUDA_DEVICE_ORDER=PCI_BUS_ID which "makes this tool slightly less useful", but 
the 
[{{nvidia-docker}}|https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation] 
explanation of GPU isolation indicates that the PCI Bus ordering may not be 
consistent with the device character file minor number anyways. I also don't 
like CUDA_VISIBLE_DEVICES being a factor, but perhaps I'm just being paranoid. 
The nvidia-smi manpage also offers that "It is recommended that users desiring 
consistency use either UUID or PCI bus ID, since device enumeration ordering is 
not guaranteed to be consistent between reboots and board serial number might 
be shared between multiple GPUs on the same board". I'm not the best person to 
interpret what this all means, but these are the places I've been looking for 
reference.

To me this suggested that the best way might be to figure out which character 
device file in {{/dev/nvidia*}} maps to which PCI bus location to correlate the 
minor number with each GPUs UUID? I was quite sure that it would be easy to 
find a canonical way to just take the Major/Minor number of the 
{{/dev/nvidia1}} device file for example and figure out the PCI info for each 
device associated with that file - but no luck yet. Also the 
{{nvidia-modprobe}} project led me to believe that different distros [treat the 
creation of device files differently but 
automatically|https://github.com/NVIDIA/nvidia-modprobe/blob/master/nvidia-modprobe.c#L18-L23],
 and thus poking around in various places in a distro dependent manner might 
work although hopefully there's a better way than that. I am not clear that the 
way GPUs have device files created for them via the {{nvidia-modprobe}} utility 
is deterministic and I suspect it is not, as it seems that [matching devices 
are just 
counted|https://github.com/NVIDIA/nvidia-modprobe/blob/master/modprobe-utils/pci-sysfs.c#L146-L158]
 to figure out [how many device files to 
create|https://github.com/NVIDIA/nvidia-modprobe/blob/master/nvidia-modprobe.c#L192-L201]-
 I probably didn't spend enough time going over the code there to offer any 
definitive insights, though.

I did find, on CentOS 6.4 which is what we're running on, a 
{{/proc/driver/nvidia/gpus/[0,1,etc?]}} directory for each GPU where the file 
{{/proc/driver/nvidia/gpus/0/information}} reads like this for an unprivileged 
user:

bq. Model:           Tesla K20m
IRQ:             40
GPU UUID:        GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:        PCIe
DMA Size:        40 bits
DMA Mask:        0xffffffffff
Bus Location:    0000:20.00.0

So an ugly hack for my specific case would be to see if those 0/1 numbers 
correspond to /dev/nvidia[0,1], and if so I can just check the bus location 
info and parse the directory name instead of using NVML. Seems pretty bad.

WDYT?






> NvidiaGpuAllocator::resources cannot load symbol nvmlGetDeviceMinorNumber - 
> can the device minor number be ascertained reliably using an older set of API 
> calls?
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-6383
>                 URL: https://issues.apache.org/jira/browse/MESOS-6383
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 1.0.1
>            Reporter: Dylan Bethune-Waddell
>            Priority: Minor
>              Labels: gpu
>
> We're attempting to deploy Mesos on a cluster with 2 Nvidia GPUs per host. We 
> are not in a position to upgrade the Nvidia drivers in the near future, and 
> are currently at driver version 319.72
> When attempting to launch an agent with the following command and take 
> advantage of Nvidia GPU support (master address elided):
> bq. {{./bin/mesos-agent.sh --master=<masterIP>:<masterPort> 
> --work_dir=/tmp/mesos --isolation="cgroups/devices,gpu/nvidia"}}
> I receive the following error message:
> bq. {{Failed to create a containerizer: Failed call to 
> NvidiaGpuAllocator::resources: Failed to nvml::initialize: Failed to load 
> symbol 'nvmlDeviceGetMinorNumber': Error looking up symbol 
> 'nvmlDeviceGetMinorNumber' in 'libnvidia-ml.so.1' : 
> /usr/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetMinorNumber}}
> Based on the change log for the NVML module, it seems that 
> {{nvmlDeviceGetMinorNumber}} is only available for driver versions 331 and 
> later as per info under the [Changes between NVML v5.319 Update and 
> v331|http://docs.nvidia.com/deploy/nvml-api/change-log.html#change-log] 
> heading in the NVML API reference.
> Is there is an alternate method of obtaining this information at runtime to 
> enable support for older versions of the Nvidia driver? Based on discussion 
> in the design document, obtaining this information from the {{nvidia-smi}} 
> command output is a feasible alternative. 
> I am willing to submit a PR that amends the behaviour of 
> {{NvidiaGpuAllocator}} such that it first attempts calls to 
> {{nvml::nvmlGetDeviceMinorNumber}} via libnvidia-ml, and if the symbol cannot 
> be found, falls back on {{--nvidia-smi="/path/to/nvidia-smi"}} option 
> obtained from mesos-agent if provided or attempts to run {{nvidia-smi}} if 
> found on path and parses the output to obtain this information. Otherwise, 
> raise an exception indicating all this was attempted.
> Would a function or class for parsing {{nvidia-smi}} output be a useful 
> contribution?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6383) NvidiaGpuAllocator::resources cannot load symbol nvmlGetDeviceMinorNumber - can the device minor number be ascertained reliably using an older set of API calls?

Reply via email to