Hi guys, *One of the slave in our cluster is blocked.* * * *I can see a lot of logs like this.* *W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_103' of framework '201306071736-252063498-5050-19065-0000': Future discarded* *W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_97' of framework '201306071736-252063498-5050-19065-0000': Future discarded* *W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_103' of framework '201306071736-252063498-5050-19065-0000': Future discarded*
*I find that the slave do launch executor_Task_Tracker_97 and executor_Task_Tracker_103, and they do work with jobtracker normally. And later, 2 executors are killed by mesos-master. And I can see the log like this.* *I0608 03:33:22.846537 24162 slave.cpp:983] Asked to kill task Task_Tracker_103 of framework 201306071736-252063498-5050-19065-0000* *I0608 03:33:22.846947 24157 slave.cpp:983] Asked to kill task Task_Tracker_97 of framework 201306071736-252063498-5050-19065-0000* *I0608 03:33:22.984558 24160 status_update_manager.cpp:290] Received status update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) for task Task_Tracker_97 of framework 201306071736-252063498-50* *50-19065-0000 with checkpoint=false* *I0608 03:33:22.984678 24160 status_update_manager.cpp:336] Forwarding status update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) for task Task_Tracker_97 of framework 201306071736-252063498-* *5050-19065-0000 to [email protected]:5050* *I0608 03:33:22.985399 24160 slave.cpp:1794] Sending acknowledgement for status update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) for task Task_Tracker_97 of framework 201306071736-25206349* *8-5050-19065-0000 to executor(1)@10.47.6.16:60089* *I0608 03:33:22.986699 24163 status_update_manager.cpp:360] Received status update acknowledgement 67a885a6-d121-41e9-9e65-810933533fe3 for task Task_Tracker_97 of framework 201306071736-252063498-5050-190* *65-0000* *Then, the internal state of the slave is wrong. After the executors are killed, there are logs like this* W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_103' of framework '201306071736-252063498-5050-19065-0000': Future discarded W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_97' of framework '201306071736-252063498-5050-19065-0000': Future discarded W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_103' of framework '201306071736-252063498-5050-19065-0000': Future discarded W0608 03:33:35.909579 24164 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_97' of framework '201306071736-252063498-5050-19065-0000': Future discarded W0608 03:33:37.406793 24167 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_103' of framework '201306071736-252063498-5050-19065-0000': Future discarded W0608 03:33:40.910625 24164 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_97' of framework '201306071736-252063498-5050-19065-0000': Future discarded W0608 03:33:42.407618 24167 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_103' of framework '201306071736-252063498-5050-19065-0000': Future discarded W0608 03:33:45.911341 24165 monitor.cpp:167] Failed to collect resource usage for executor 'executor_Task_Tracker_97' of framework '201306071736-252063498-5050-19065-0000': Future discarded *At last, slave is blocked, and it never starts any executor. Although I can see the log slave is launching executor, but I can not find any files in the executor working directory.* * * *Does this a known issue? I search the jira, but only find this. * http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3CJIRA.12646311.1367878134469.276019.1367878215622@arcas%3E * * Thanks. Guodong
