Jihun Kang created MESOS-2990:
---------------------------------
Summary: Task dropped into LOST state
Key: MESOS-2990
URL: https://issues.apache.org/jira/browse/MESOS-2990
Project: Mesos
Issue Type: Bug
Components: slave
Affects Versions: 0.22.1
Environment: RHEL 7.0 ppc64
IBM JDK 1.7.0 SR 7
Reporter: Jihun Kang
Every time I ran "test-framework" command on the shell, mesos always failed to
run each tasks. First task on this framework dropped into the *LOST* state, and
another tasks also terminated.
Following is the message from "test-framework".
{code}
# ./src/test-framework --master=10.10.14.72:5050
I0706 17:24:44.202020 38486 sched.cpp:157] Version: 0.22.1
I0706 17:24:44.210917 38523 sched.cpp:254] New master detected at
[email protected]:5050
I0706 17:24:44.212316 38523 sched.cpp:264] No credentials provided. Attempting
to register without authentication
I0706 17:24:44.215756 38529 sched.cpp:448] Framework registered with
20150706-154445-168431176-5050-19360-0000
Registered!
Received offer 20150706-154445-168431176-5050-19360-O0 with cpus(*):40;
mem(*):60064; disk(*):46055; ports(*):[31000-32000]
Launching task 0 using offer 20150706-154445-168431176-5050-19360-O0
Launching task 1 using offer 20150706-154445-168431176-5050-19360-O0
Launching task 2 using offer 20150706-154445-168431176-5050-19360-O0
Launching task 3 using offer 20150706-154445-168431176-5050-19360-O0
Launching task 4 using offer 20150706-154445-168431176-5050-19360-O0
Task 0 is in state TASK_LOST
Aborting because task 0 is in unexpected state TASK_LOST with reason 1 from
source 1 with message 'Executor terminated'
I0706 17:24:44.428568 38513 sched.cpp:1623] Asked to abort the driver
I0706 17:24:44.428665 38513 sched.cpp:856] Aborting framework
'20150706-154445-168431176-5050-19360-0000'
I0706 17:24:44.428987 38486 sched.cpp:1589] Asked to stop the driver
I0706 17:24:44.429121 38539 sched.cpp:831] Stopping framework
'20150706-154445-168431176-5050-19360-0000'
{code}
Followings also got from the slave log.
{code}
I0706 17:24:44.225492 19452 slave.cpp:1144] Got assigned task 0 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.226763 19452 slave.cpp:1144] Got assigned task 1 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.227041 19452 slave.cpp:1144] Got assigned task 2 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.227252 19452 slave.cpp:1254] Launching task 0 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.238466 19452 slave.cpp:4208] Launching executor default of
framework 20150706-154445-168431176-5050-19360-0000 in work directory
'/tmp/mesos/slaves/20150706-154445-168431176-5050-19360-S0/frameworks/20150706-154445-168431176-5050-19360-0000/executors/default/runs/d235751e-986c-44ae-a6c9-953814dac2f8'
I0706 17:24:44.239434 19460 containerizer.cpp:484] Starting container
'd235751e-986c-44ae-a6c9-953814dac2f8' for executor 'default' of framework
'20150706-154445-168431176-5050-19360-0000'
I0706 17:24:44.239447 19452 slave.cpp:1401] Queuing task '0' for executor
default of framework '20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.239769 19452 slave.cpp:1144] Got assigned task 3 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240003 19452 slave.cpp:1254] Launching task 1 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240056 19452 slave.cpp:1401] Queuing task '1' for executor
default of framework '20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240100 19452 slave.cpp:1254] Launching task 2 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240146 19452 slave.cpp:1401] Queuing task '2' for executor
default of framework '20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240228 19452 slave.cpp:1144] Got assigned task 4 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240414 19452 slave.cpp:1254] Launching task 3 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240460 19452 slave.cpp:1401] Queuing task '3' for executor
default of framework '20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240501 19452 slave.cpp:1254] Launching task 4 for framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.240592 19452 slave.cpp:1401] Queuing task '4' for executor
default of framework '20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.245450 19460 launcher.cpp:130] Forked child with pid '38542' for
container 'd235751e-986c-44ae-a6c9-953814dac2f8'
I0706 17:24:44.247877 19458 slave.cpp:3165] Monitoring executor 'default' of
framework '20150706-154445-168431176-5050-19360-0000' in container
'd235751e-986c-44ae-a6c9-953814dac2f8'
I0706 17:24:44.346531 19472 containerizer.cpp:1123] Executor for container
'd235751e-986c-44ae-a6c9-953814dac2f8' has exited
I0706 17:24:44.346587 19472 containerizer.cpp:918] Destroying container
'd235751e-986c-44ae-a6c9-953814dac2f8'
I0706 17:24:44.392073 19450 slave.cpp:3223] Executor 'default' of framework
20150706-154445-168431176-5050-19360-0000 exited with status 127
I0706 17:24:44.398993 19450 slave.cpp:2531] Handling status update TASK_LOST
(UUID: f013f24c-5f2d-4623-82e8-b96f46bb3143) for task 0 of framework
20150706-154445-168431176-5050-19360-0000 from @0.0.0.0:0
W0706 17:24:44.399422 19455 containerizer.cpp:814] Ignoring update for unknown
container: d235751e-986c-44ae-a6c9-953814dac2f8
I0706 17:24:44.406010 19450 slave.cpp:2531] Handling status update TASK_LOST
(UUID: 73b14123-8bc2-443b-a04b-89bfe3ff9893) for task 1 of framework
20150706-154445-168431176-5050-19360-0000 from @0.0.0.0:0
W0706 17:24:44.406121 19465 containerizer.cpp:814] Ignoring update for unknown
container: d235751e-986c-44ae-a6c9-953814dac2f8
I0706 17:24:44.412560 19450 slave.cpp:2531] Handling status update TASK_LOST
(UUID: 5c3420c3-9082-4904-8a6e-61b1a0ec52ca) for task 2 of framework
20150706-154445-168431176-5050-19360-0000 from @0.0.0.0:0
W0706 17:24:44.412675 19479 containerizer.cpp:814] Ignoring update for unknown
container: d235751e-986c-44ae-a6c9-953814dac2f8
I0706 17:24:44.418977 19450 slave.cpp:2531] Handling status update TASK_LOST
(UUID: 475a9f26-6f0e-4c11-a959-8be83a633101) for task 3 of framework
20150706-154445-168431176-5050-19360-0000 from @0.0.0.0:0
W0706 17:24:44.419096 19467 containerizer.cpp:814] Ignoring update for unknown
container: d235751e-986c-44ae-a6c9-953814dac2f8
I0706 17:24:44.425416 19450 slave.cpp:2531] Handling status update TASK_LOST
(UUID: dedc265d-c09b-4311-a77f-d9ab4f53960f) for task 4 of framework
20150706-154445-168431176-5050-19360-0000 from @0.0.0.0:0
W0706 17:24:44.425529 19446 containerizer.cpp:814] Ignoring update for unknown
container: d235751e-986c-44ae-a6c9-953814dac2f8
I0706 17:24:44.425901 19478 status_update_manager.cpp:317] Received status
update TASK_LOST (UUID: f013f24c-5f2d-4623-82e8-b96f46bb3143) for task 0 of
framework 20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.426853 19441 slave.cpp:2776] Forwarding the update TASK_LOST
(UUID: f013f24c-5f2d-4623-82e8-b96f46bb3143) for task 0 of framework
20150706-154445-168431176-5050-19360-0000 to [email protected]:5050
I0706 17:24:44.426983 19478 status_update_manager.cpp:317] Received status
update TASK_LOST (UUID: 73b14123-8bc2-443b-a04b-89bfe3ff9893) for task 1 of
framework 20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.427263 19478 status_update_manager.cpp:317] Received status
update TASK_LOST (UUID: 5c3420c3-9082-4904-8a6e-61b1a0ec52ca) for task 2 of
framework 20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.427326 19441 slave.cpp:2776] Forwarding the update TASK_LOST
(UUID: 73b14123-8bc2-443b-a04b-89bfe3ff9893) for task 1 of framework
20150706-154445-168431176-5050-19360-0000 to [email protected]:5050
I0706 17:24:44.427475 19441 slave.cpp:2776] Forwarding the update TASK_LOST
(UUID: 5c3420c3-9082-4904-8a6e-61b1a0ec52ca) for task 2 of framework
20150706-154445-168431176-5050-19360-0000 to [email protected]:5050
I0706 17:24:44.427508 19478 status_update_manager.cpp:317] Received status
update TASK_LOST (UUID: 475a9f26-6f0e-4c11-a959-8be83a633101) for task 3 of
framework 20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.427753 19449 slave.cpp:2776] Forwarding the update TASK_LOST
(UUID: 475a9f26-6f0e-4c11-a959-8be83a633101) for task 3 of framework
20150706-154445-168431176-5050-19360-0000 to [email protected]:5050
I0706 17:24:44.427798 19478 status_update_manager.cpp:317] Received status
update TASK_LOST (UUID: dedc265d-c09b-4311-a77f-d9ab4f53960f) for task 4 of
framework 20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.428028 19449 slave.cpp:2776] Forwarding the update TASK_LOST
(UUID: dedc265d-c09b-4311-a77f-d9ab4f53960f) for task 4 of framework
20150706-154445-168431176-5050-19360-0000 to [email protected]:5050
I0706 17:24:44.431067 19471 slave.cpp:1768] Asked to shut down framework
20150706-154445-168431176-5050-19360-0000 by [email protected]:5050
I0706 17:24:44.431099 19471 slave.cpp:1793] Shutting down framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.431450 19471 slave.cpp:3332] Cleaning up executor 'default' of
framework 20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.431660 19476 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150706-154445-168431176-5050-19360-S0/frameworks/20150706-154445-168431176-5050-19360-0000/executors/default/runs/d235751e-986c-44ae-a6c9-953814dac2f8'
for gc 6.99999500509333days in the future
I0706 17:24:44.432011 19471 slave.cpp:3411] Cleaning up framework
20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.432106 19476 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150706-154445-168431176-5050-19360-S0/frameworks/20150706-154445-168431176-5050-19360-0000/executors/default'
for gc 6.99999500129482days in the future
I0706 17:24:44.432111 19468 status_update_manager.cpp:279] Closing status
update streams for framework 20150706-154445-168431176-5050-19360-0000
I0706 17:24:44.432212 19476 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150706-154445-168431176-5050-19360-S0/frameworks/20150706-154445-168431176-5050-19360-0000'
for gc 6.99999499845037days in the future
I0706 17:25:32.152350 19462 slave.cpp:3648] Current disk usage 15.70%. Max
allowed age: 5.201143517953102days
I0706 17:25:44.240165 19446 slave.cpp:3564] Framework
20150706-154445-168431176-5050-19360-0000 seems to have exited. Ignoring
registration timeout for executor 'default'
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)