I've fixed the issue you linked, and what you've mentioned above does not seem related. Can you provide the full logs? Is the slave shutting down?
On Fri, Jun 7, 2013 at 10:51 PM, 王国栋 <[email protected]> wrote: > Hi guys, > > *One of the slave in our cluster is blocked.* > * > * > *I can see a lot of logs like this.* > *W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_103' of framework > '201306071736-252063498-5050-19065-0000': Future discarded* > *W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_97' of framework > '201306071736-252063498-5050-19065-0000': Future discarded* > *W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_103' of framework > '201306071736-252063498-5050-19065-0000': Future discarded* > > *I find that the slave do launch executor_Task_Tracker_97 and > executor_Task_Tracker_103, and they do work with jobtracker normally. And > later, 2 executors are killed by mesos-master. And I can see the log like > this.* > > *I0608 03:33:22.846537 24162 slave.cpp:983] Asked to kill task > Task_Tracker_103 of framework 201306071736-252063498-5050-19065-0000* > *I0608 03:33:22.846947 24157 slave.cpp:983] Asked to kill task > Task_Tracker_97 of framework 201306071736-252063498-5050-19065-0000* > *I0608 03:33:22.984558 24160 status_update_manager.cpp:290] Received status > update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) for task > Task_Tracker_97 of framework 201306071736-252063498-50* > *50-19065-0000 with checkpoint=false* > *I0608 03:33:22.984678 24160 status_update_manager.cpp:336] Forwarding > status update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) > for task Task_Tracker_97 of framework 201306071736-252063498-* > *5050-19065-0000 to [email protected]:5050* > *I0608 03:33:22.985399 24160 slave.cpp:1794] Sending acknowledgement for > status update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) > for task Task_Tracker_97 of framework 201306071736-25206349* > *8-5050-19065-0000 to executor(1)@10.47.6.16:60089* > *I0608 03:33:22.986699 24163 status_update_manager.cpp:360] Received status > update acknowledgement 67a885a6-d121-41e9-9e65-810933533fe3 for task > Task_Tracker_97 of framework 201306071736-252063498-5050-190* > *65-0000* > > *Then, the internal state of the slave is wrong. After the executors are > killed, there are logs like this* > W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_103' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_97' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_103' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > W0608 03:33:35.909579 24164 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_97' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > W0608 03:33:37.406793 24167 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_103' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > W0608 03:33:40.910625 24164 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_97' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > W0608 03:33:42.407618 24167 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_103' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > W0608 03:33:45.911341 24165 monitor.cpp:167] Failed to collect resource > usage for executor 'executor_Task_Tracker_97' of framework > '201306071736-252063498-5050-19065-0000': Future discarded > > *At last, slave is blocked, and it never starts any executor. Although I > can see the log slave is launching executor, but I can not find any files > in the executor working directory.* > * > * > *Does this a known issue? I search the jira, but only find this. * > > http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3CJIRA.12646311.1367878134469.276019.1367878215622@arcas%3E > * > * > Thanks. > > Guodong >
