The goal is to let users leverage the nvidia Docker images
(https://hub.docker.com/r/nvidia/) without any added effort on their
behalf. Using docker they are able to launch containers from these
images by simply running `nvidia-docker run ...` (i.e. they are
unaware that a magic volume is being injected on their behalf). On
Mesos we want the experience to be similar.

In terms of providing an external component to do the library
consolidation instead of building it into Mesos itself -- we
considered this.  We originally planned on building this functionality
as an isolator module (giving us the benefit of external linkage
without having to run a separate linux process), but there some some
limitations with the current isolator interface that prohibit us from
doing this properly. Moreover, building it as an isolator module would
mean that it couldn't be shared by the docker containerizer (which we
plan to add support for in the future).

On Mon, Jun 20, 2016 at 7:30 PM, Jean Christophe “JC” Martin
<jch.mar...@gmail.com> wrote:
> Kevin,
>
> I agree about the need to create the volume, and gather the information. My 
> point was not really clear, sorry.
> My point was that it should not be different than any use case needing 
> special mounts and could either be solved by passing this information at the 
> time of container creation (it doesn’t seem that there are that many 
> libraries, and it would not be harder than say running the mesos slave in a 
> container, purely from a number of volume statements), or it could be solved 
> externally as the docker volume container does with a more generic solution.
>
> Thanks,
>
> JC
>
>> On Jun 20, 2016, at 6:59 PM, Kevin Klues <klue...@gmail.com> wrote:
>>
>> For now we've decided to actually remove the hard dependence on libelf
>> for the 1.0 release and spend a bit more time thinking about the right
>> way to pull it in.
>>
>> Jean, to answer your question though -- someone would still need to
>> consolidate these libraries, even if it wasn't left to Mesos to do so.
>> These libraries are spread across the file system, and need to be
>> pulled into a single place for easy injection. The full list of
>> binaries / libraries are here:
>>
>> https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109
>>
>> We could put this burden on the operator and trust he gets it right,
>> or we could have Mesos programmatically do it itself. We considered
>> just leveraging the nvidia-docker-plugin itself (instead of
>> duplicating its functionality into mesos), but ultimately decided it
>> was better not to introduce an external dependency on it (since it is
>> a separate running excutable, rather than a simple library, like
>> libelf).
>>
>> On Mon, Jun 20, 2016 at 5:12 PM, Jean Christophe “JC” Martin
>> <jch.mar...@gmail.com> wrote:
>>> As an operator not using GPUs, I feel that the burden seems misplaced, and 
>>> disproportionate.
>>> I assume that the operator of a GPU cluster knows the location of the 
>>> libraries based on their OS, and could potentially provide this information 
>>> at the time of creating the containers. I am not sure to see why this 
>>> something that mesos is required to do (consolidating the libraries in the 
>>> volume, versus being a configuration/external information).
>>>
>>> Thanks,
>>>
>>> JC
>>>
>>>> On Jun 20, 2016, at 2:30 PM, Kevin Klues <klue...@gmail.com> wrote:
>>>>
>>>> Sorry, the ticket just links to the nvidia-docker project without much
>>>> further explanation. The information at the link below should make it
>>>> a bit more clear:
>>>>
>>>> https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver.
>>>>
>>>> The crux of the issue is that we need to be able consolidate all of
>>>> the Nvidia binaries/libraries into a single volume that we inject into
>>>> a docker container.  We use libelf is used to get the canonical names
>>>> of all the Nvidia libraries (i.e. SONAME in their dynamic sections) as
>>>> well as lookup what external dependences they have (i.e. NEEDED in
>>>> their dynamic sections) in order to build this volume.
>>>>
>>>> NOTE: None of this volume support is actually in Mesos yet -- we just
>>>> added the libelf dependence in anticipation of it.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 20, 2016 at 12:59 PM, Yan Xu <xuj...@apple.com> wrote:
>>>>> It's not immediately clear form the ticket why the change from optional
>>>>> dependency to required dependency though? Could you summarize?
>>>>>
>>>>>
>>>>> On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues <klue...@gmail.com> wrote:
>>>>>>
>>>>>> Thanks Zhitao,
>>>>>>
>>>>>> I just pushed out a review for upgrades.md and added you as a reviewer.
>>>>>>
>>>>>> The new dependence was added in the JIRA that haosdent linked, but the
>>>>>> actual reason for adding the dependence is more related to:
>>>>>> https://issues.apache.org/jira/browse/MESOS-5401
>>>>>>
>>>>>> On Sun, Jun 19, 2016 at 9:34 AM, haosdent <haosd...@gmail.com> wrote:
>>>>>>> The related issue is Change build to always enable Nvidia GPU support
>>>>>>> for
>>>>>>> Linux
>>>>>>> Last time my local build break before Kevin send out the email, and then
>>>>>>> find this change.
>>>>>>>
>>>>>>> On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li <zhitaoli...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Kevin,
>>>>>>>>
>>>>>>>> Thanks for letting us know. It seems like this is not called out in
>>>>>>>> upgrades.md, so can you please document this additional dependency
>>>>>>>> there?
>>>>>>>>
>>>>>>>> Also, can you include the link to the JIRA or patch requiring this
>>>>>>>> dependency so we can have some contexts?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues <klue...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello all,
>>>>>>>>>
>>>>>>>>> Just an FYI that the newest libmesos now has an external dependence
>>>>>>>>> on
>>>>>>>>> libelf on Linux. This dependence can be installed via the following
>>>>>>>>> packages:
>>>>>>>>>
>>>>>>>>> CentOS 6/7:     yum install elfutils-libelf.x86_64
>>>>>>>>> Ubuntu14.04:   apt-get install libelf1
>>>>>>>>>
>>>>>>>>> Alternatively you can install from source:
>>>>>>>>> https://directory.fsf.org/wiki/Libelf
>>>>>>>>>
>>>>>>>>> For developers, you will also need to install the libelf headers in
>>>>>>>>> order to build master. This dependency can be installed via:
>>>>>>>>>
>>>>>>>>> CentOS: elfutils-libelf-devel.x86_64
>>>>>>>>> Ubuntu: libelf-dev
>>>>>>>>>
>>>>>>>>> Alternatively, you can install from source:
>>>>>>>>> https://directory.fsf.org/wiki/Libelf
>>>>>>>>>
>>>>>>>>> The getting started guide and the support/docker_build.sh scripts
>>>>>>>>> have
>>>>>>>>> been updated appropriately, but you may need to update your local
>>>>>>>>> environment if you don't yet have these packages installed.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ~Kevin
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Zhitao Li
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ~Kevin
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ~Kevin
>>>
>>
>>
>>
>> --
>> ~Kevin
>



-- 
~Kevin

Reply via email to