I'm reading MESOS-6010 [1] which seems very similar to the problem I'm
having but I do have the variables in all lower case and I'm assuming that
fix is already part of 1.3.0...


[1] https://issues.apache.org/jira/browse/MESOS-6010

On Thu, Jul 20, 2017 at 3:08 PM, William Markito Oliveira <
william.mark...@gmail.com> wrote:

> Hi folks,
>
> I'm trying to setup a Mesos cluster and submit the "hello world" GPU
> example from the documentation page:
>
> *"mesos-execute --master=10.120.59.5:5050 <http://10.120.59.5:5050>
> --name=gpu-test       --docker_image="nvidia/cuda"
> --command="nvidia-smi"       --framework_capabilities="GPU_RESOURCES"
> --resources="gpus:1"*
>
>
> This returns:
>
> I0720 16:02:26.039414 102623 scheduler.cpp:184] Version: 1.3.0
> I0720 16:02:26.040221 102619 scheduler.cpp:470] New master detected at
> master@10.120.59.5:5050
> Subscribed with ID 4d3b5156-85df-4ffd-a8cd-9e0ecaa90e39-0015
> Submitted task 'gpu-test' to agent 'b2d906e8-1207-4ceb-aeb0-
> 42be1150cff8-S4'
> Received status update TASK_FAILED for task 'gpu-test'
>   message: 'Failed to launch container: Failed to perform 'curl': curl:
> (7) Failed to connect to registry-1.docker.io port 443: Connection refused
> '
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
>
> ---
>
> Now when specifying the containerizer=docker  I receive the following
> output:
>
> mesos-execute --containerizer=docker      --master=10.120.59.5:5050
> --name=gpu-test       --docker_image="nvidia/cuda"
> --command="nvidia-smi"       --framework_capabilities="GPU_RESOURCES"
>   --resources="gpus:1"
> I0720 16:02:59.589102 102719 scheduler.cpp:184] Version: 1.3.0
> I0720 16:02:59.589792 102727 scheduler.cpp:470] New master detected at
> master@10.120.59.5:5050
> Subscribed with ID 4d3b5156-85df-4ffd-a8cd-9e0ecaa90e39-0016
> Submitted task 'gpu-test' to agent 'b2d906e8-1207-4ceb-aeb0-
> 42be1150cff8-S3'
> Received status update TASK_RUNNING for task 'gpu-test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FAILED for task 'gpu-test'
>   message: 'Container exited with status 127'
>   source: SOURCE_EXECUTOR
>
> So still no success, but a different error.
>
> My environment does have http_proxy and https_proxy variables with proper
> values and I've set them before starting the agents.
>
> Both docker and nvidia-docker pull works just fine and can download the
> images.
>
> Any thoughts on how to fix or troubleshoot this ?
>
> Thank you!
>
> Version: Mesos 1.3.0
> OS: Ubuntu 16.04
> Docker: 17.06.0-ce (+ NVIDIA-Docker)
>
>
> --
> ~/William
>

Reply via email to