[ 
https://issues.apache.org/jira/browse/MESOS-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951914#comment-15951914
 ] 

Yu Yang commented on MESOS-6810:
--------------------------------

Sorry for forgetting to post my solution here.

This error is caused by connection problem between mesos cluster and docker 
registry, so the solution is clear, if you are in china, you may need to deploy 
a docker mirror or a private docker registry. some third part service such as 
Daocloud, aliyun also works. just do test and find the best one for you, then 
change {{--docker_registry}} config, increasing the value of 
{{--registry_fetch_timeout}} also helps when your network is not stable.

> Tasks getting stuck in STAGING state when using unified containerizer
> ---------------------------------------------------------------------
>
>                 Key: MESOS-6810
>                 URL: https://issues.apache.org/jira/browse/MESOS-6810
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 1.0.0, 1.0.1, 1.1.0
>         Environment: *OS*: ubuntu16.04 64bit
> *mesos*: 1.1.0, one master and one agent on same machine
> *Agent flag*: {{sudo ./bin/mesos-agent.sh --master=192.168.1.192:5050 
> --work_dir=/tmp/mesos_slave --image_providers=docker 
> --isolation=docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia 
> --containerizers=mesos,docker --executor_environment_variables="{}"}}
>            Reporter: Yu Yang
>
> when submit tasks using container settings like:
> {code}
> {
>     "container": {
>         "mesos": {
>           "image": {
>               "docker": {
>                   "name": "nvidia/cuda"
>               },
>               "type": "DOCKER"
>           }
>         },
>        "type": "MESOS"
>     },
> }
> {code}
> then task will get stuck in STAGING state, and finally it will fail with 
> message {{Failed to launch container: Collect failed: Failed to perform 
> 'curl': curl: (56) GnuTLS recv error (-54): Error in pull function}}          
>                                                   this is the related log on 
> agent
> {code}
> I1217 13:05:35.406365 20780 slave.cpp:1539] Got assigned task 
> 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for framework 
> 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.406749 20780 slave.cpp:1701] Launching task 
> 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for framework 
> 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.406970 20780 paths.cpp:536] Trying to chown 
> '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c'
>  to user 'root'
> I1217 13:05:35.409272 20780 slave.cpp:6179] Launching executor 
> 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 
> 02083c57-b2d9-4054-babe-90e962816813-0001 with resources cpus(*):0.1; 
> mem(*):32 in work directory 
> '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c'
> I1217 13:05:35.409958 20780 slave.cpp:1987] Queued task 
> 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for executor 
> 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 
> 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.410163 20779 docker.cpp:1000] Skipping non-docker container
> I1217 13:05:35.410636 20776 containerizer.cpp:938] Starting container 
> 8be3b5cd-afa3-4189-aa2a-f09d73529f8c for executor 
> 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 
> 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:44.459362 20778 slave.cpp:4992] Terminating executor 
> ''cuda_mesos_nvidia_tf.72e9b9cf-8220-49bd-86fe-1667ee5e7a02' of framework 
> 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within 
> 1mins
> I1217 13:05:53.586819 20780 slave.cpp:5044] Current disk usage 63.59%. Max 
> allowed age: 1.848503351525151days
> I1217 13:06:35.410905 20777 slave.cpp:4992] Terminating executor 
> ''mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 
> 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within 
> 1mins
> I1217 13:06:35.411175 20780 containerizer.cpp:1950] Destroying container 
> 8be3b5cd-afa3-4189-aa2a-f09d73529f8c in PROVISIONING state
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to