Yes, I am running the code from the master branch. There are 3 framework at the same time, one is Jobtracker, the other two are spark jobs.
I will try to grep some master log at that time. Hoping it will help. Guodong On Fri, Jun 14, 2013 at 5:30 AM, Benjamin Mahler <[email protected]>wrote: > Looks like you're running off the master branch? > > Interestingly, the master asked the slave to shutdown the framework before > the discarded log messages happened: > > I0608 02:33:49.211230 24157 slave.cpp:1106] Asked to shut down framework > 201306071736-252063498-5050-19065-0044 by [email protected]:5050 > > Then I see another framework is launching tasks (note the framework ID is > different): > I0608 02:35:07.981590 24170 slave.cpp:824] Launching task Task_Tracker_97 > for framework 201306071736-252063498-5050-19065-0000 > > 1. Do you have the master logs available? > > 2. It seems you're running two or more frameworks in your cluster, correct? > > I'm not sure how the discarded is occurring at the moment, more information > will help. :) > > > > > > On Wed, Jun 12, 2013 at 7:36 PM, 王国栋 <[email protected]> wrote: > > > The slave is not shutting down. > > > > FYI, the final part of log of the slave is attached. > > > > Guodong > > > > > > On Mon, Jun 10, 2013 at 7:21 AM, Benjamin Mahler < > > [email protected]> wrote: > > > >> I've fixed the issue you linked, and what you've mentioned above does > not > >> seem related. Can you provide the full logs? Is the slave shutting down? > >> > >> > >> On Fri, Jun 7, 2013 at 10:51 PM, 王国栋 <[email protected]> wrote: > >> > >> > Hi guys, > >> > > >> > *One of the slave in our cluster is blocked.* > >> > * > >> > * > >> > *I can see a lot of logs like this.* > >> > *W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_103' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded* > >> > *W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_97' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded* > >> > *W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_103' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded* > >> > > >> > *I find that the slave do launch executor_Task_Tracker_97 and > >> > executor_Task_Tracker_103, and they do work with jobtracker normally. > >> And > >> > later, 2 executors are killed by mesos-master. And I can see the log > >> like > >> > this.* > >> > > >> > *I0608 03:33:22.846537 24162 slave.cpp:983] Asked to kill task > >> > Task_Tracker_103 of framework 201306071736-252063498-5050-19065-0000* > >> > *I0608 03:33:22.846947 24157 slave.cpp:983] Asked to kill task > >> > Task_Tracker_97 of framework 201306071736-252063498-5050-19065-0000* > >> > *I0608 03:33:22.984558 24160 status_update_manager.cpp:290] Received > >> status > >> > update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) for > >> task > >> > Task_Tracker_97 of framework 201306071736-252063498-50* > >> > *50-19065-0000 with checkpoint=false* > >> > *I0608 03:33:22.984678 24160 status_update_manager.cpp:336] Forwarding > >> > status update TASK_FINISHED (UUID: > 67a885a6-d121-41e9-9e65-810933533fe3) > >> > for task Task_Tracker_97 of framework 201306071736-252063498-* > >> > *5050-19065-0000 to [email protected]:5050* > >> > *I0608 03:33:22.985399 24160 slave.cpp:1794] Sending acknowledgement > for > >> > status update TASK_FINISHED (UUID: > 67a885a6-d121-41e9-9e65-810933533fe3) > >> > for task Task_Tracker_97 of framework 201306071736-25206349* > >> > *8-5050-19065-0000 to executor(1)@10.47.6.16:60089* > >> > *I0608 03:33:22.986699 24163 status_update_manager.cpp:360] Received > >> status > >> > update acknowledgement 67a885a6-d121-41e9-9e65-810933533fe3 for task > >> > Task_Tracker_97 of framework 201306071736-252063498-5050-190* > >> > *65-0000* > >> > > >> > *Then, the internal state of the slave is wrong. After the executors > are > >> > killed, there are logs like this* > >> > W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_103' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_97' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_103' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > W0608 03:33:35.909579 24164 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_97' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > W0608 03:33:37.406793 24167 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_103' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > W0608 03:33:40.910625 24164 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_97' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > W0608 03:33:42.407618 24167 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_103' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > W0608 03:33:45.911341 24165 monitor.cpp:167] Failed to collect > resource > >> > usage for executor 'executor_Task_Tracker_97' of framework > >> > '201306071736-252063498-5050-19065-0000': Future discarded > >> > > >> > *At last, slave is blocked, and it never starts any executor. > Although I > >> > can see the log slave is launching executor, but I can not find any > >> files > >> > in the executor working directory.* > >> > * > >> > * > >> > *Does this a known issue? I search the jira, but only find this. * > >> > > >> > > >> > http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3CJIRA.12646311.1367878134469.276019.1367878215622@arcas%3E > >> > * > >> > * > >> > Thanks. > >> > > >> > Guodong > >> > > >> > > > > >
