Hi guys,

*One of the slave in our cluster is blocked.*
*
*
*I can see a lot of logs like this.*
*W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_103' of framework
'201306071736-252063498-5050-19065-0000': Future discarded*
*W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_97' of framework
'201306071736-252063498-5050-19065-0000': Future discarded*
*W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_103' of framework
'201306071736-252063498-5050-19065-0000': Future discarded*

*I find that the slave do launch executor_Task_Tracker_97 and
executor_Task_Tracker_103, and they do work with jobtracker normally. And
later, 2 executors are killed by mesos-master. And I can see the log like
this.*

*I0608 03:33:22.846537 24162 slave.cpp:983] Asked to kill task
Task_Tracker_103 of framework 201306071736-252063498-5050-19065-0000*
*I0608 03:33:22.846947 24157 slave.cpp:983] Asked to kill task
Task_Tracker_97 of framework 201306071736-252063498-5050-19065-0000*
*I0608 03:33:22.984558 24160 status_update_manager.cpp:290] Received status
update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) for task
Task_Tracker_97 of framework 201306071736-252063498-50*
*50-19065-0000 with checkpoint=false*
*I0608 03:33:22.984678 24160 status_update_manager.cpp:336] Forwarding
status update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3)
for task Task_Tracker_97 of framework 201306071736-252063498-*
*5050-19065-0000 to [email protected]:5050*
*I0608 03:33:22.985399 24160 slave.cpp:1794] Sending acknowledgement for
status update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3)
for task Task_Tracker_97 of framework 201306071736-25206349*
*8-5050-19065-0000 to executor(1)@10.47.6.16:60089*
*I0608 03:33:22.986699 24163 status_update_manager.cpp:360] Received status
update acknowledgement 67a885a6-d121-41e9-9e65-810933533fe3 for task
Task_Tracker_97 of framework 201306071736-252063498-5050-190*
*65-0000*

*Then, the internal state of the slave is wrong. After the executors are
killed, there are logs like this*
W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_103' of framework
'201306071736-252063498-5050-19065-0000': Future discarded
W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_97' of framework
'201306071736-252063498-5050-19065-0000': Future discarded
W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_103' of framework
'201306071736-252063498-5050-19065-0000': Future discarded
W0608 03:33:35.909579 24164 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_97' of framework
'201306071736-252063498-5050-19065-0000': Future discarded
W0608 03:33:37.406793 24167 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_103' of framework
'201306071736-252063498-5050-19065-0000': Future discarded
W0608 03:33:40.910625 24164 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_97' of framework
'201306071736-252063498-5050-19065-0000': Future discarded
W0608 03:33:42.407618 24167 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_103' of framework
'201306071736-252063498-5050-19065-0000': Future discarded
W0608 03:33:45.911341 24165 monitor.cpp:167] Failed to collect resource
usage for executor 'executor_Task_Tracker_97' of framework
'201306071736-252063498-5050-19065-0000': Future discarded

*At last, slave is blocked, and it never starts any executor. Although I
can see the log slave is launching executor, but I can not find any files
in the executor working directory.*
*
*
*Does this a known issue? I search the jira, but only find this. *
http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3CJIRA.12646311.1367878134469.276019.1367878215622@arcas%3E
*
*
Thanks.

Guodong

Reply via email to