Yes, I am running the code from the master branch.
There are 3 framework at the same time, one is Jobtracker, the other two
are spark jobs.

I will try to grep some master log at that time. Hoping it will help.

Guodong


On Fri, Jun 14, 2013 at 5:30 AM, Benjamin Mahler
<[email protected]>wrote:

> Looks like you're running off the master branch?
>
> Interestingly, the master asked the slave to shutdown the framework before
> the discarded log messages happened:
>
> I0608 02:33:49.211230 24157 slave.cpp:1106] Asked to shut down framework
> 201306071736-252063498-5050-19065-0044 by [email protected]:5050
>
> Then I see another framework is launching tasks (note the framework ID is
> different):
> I0608 02:35:07.981590 24170 slave.cpp:824] Launching task Task_Tracker_97
> for framework 201306071736-252063498-5050-19065-0000
>
> 1. Do you have the master logs available?
>
> 2. It seems you're running two or more frameworks in your cluster, correct?
>
> I'm not sure how the discarded is occurring at the moment, more information
> will help. :)
>
>
>
>
>
> On Wed, Jun 12, 2013 at 7:36 PM, 王国栋 <[email protected]> wrote:
>
> > The slave is not shutting down.
> >
> > FYI, the final part of log of the slave is attached.
> >
> > Guodong
> >
> >
> > On Mon, Jun 10, 2013 at 7:21 AM, Benjamin Mahler <
> > [email protected]> wrote:
> >
> >> I've fixed the issue you linked, and what you've mentioned above does
> not
> >> seem related. Can you provide the full logs? Is the slave shutting down?
> >>
> >>
> >> On Fri, Jun 7, 2013 at 10:51 PM, 王国栋 <[email protected]> wrote:
> >>
> >> > Hi guys,
> >> >
> >> > *One of the slave in our cluster is blocked.*
> >> > *
> >> > *
> >> > *I can see a lot of logs like this.*
> >> > *W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_103' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded*
> >> > *W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_97' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded*
> >> > *W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_103' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded*
> >> >
> >> > *I find that the slave do launch executor_Task_Tracker_97 and
> >> > executor_Task_Tracker_103, and they do work with jobtracker normally.
> >> And
> >> > later, 2 executors are killed by mesos-master. And I can see the log
> >> like
> >> > this.*
> >> >
> >> > *I0608 03:33:22.846537 24162 slave.cpp:983] Asked to kill task
> >> > Task_Tracker_103 of framework 201306071736-252063498-5050-19065-0000*
> >> > *I0608 03:33:22.846947 24157 slave.cpp:983] Asked to kill task
> >> > Task_Tracker_97 of framework 201306071736-252063498-5050-19065-0000*
> >> > *I0608 03:33:22.984558 24160 status_update_manager.cpp:290] Received
> >> status
> >> > update TASK_FINISHED (UUID: 67a885a6-d121-41e9-9e65-810933533fe3) for
> >> task
> >> > Task_Tracker_97 of framework 201306071736-252063498-50*
> >> > *50-19065-0000 with checkpoint=false*
> >> > *I0608 03:33:22.984678 24160 status_update_manager.cpp:336] Forwarding
> >> > status update TASK_FINISHED (UUID:
> 67a885a6-d121-41e9-9e65-810933533fe3)
> >> > for task Task_Tracker_97 of framework 201306071736-252063498-*
> >> > *5050-19065-0000 to [email protected]:5050*
> >> > *I0608 03:33:22.985399 24160 slave.cpp:1794] Sending acknowledgement
> for
> >> > status update TASK_FINISHED (UUID:
> 67a885a6-d121-41e9-9e65-810933533fe3)
> >> > for task Task_Tracker_97 of framework 201306071736-25206349*
> >> > *8-5050-19065-0000 to executor(1)@10.47.6.16:60089*
> >> > *I0608 03:33:22.986699 24163 status_update_manager.cpp:360] Received
> >> status
> >> > update acknowledgement 67a885a6-d121-41e9-9e65-810933533fe3 for task
> >> > Task_Tracker_97 of framework 201306071736-252063498-5050-190*
> >> > *65-0000*
> >> >
> >> > *Then, the internal state of the slave is wrong. After the executors
> are
> >> > killed, there are logs like this*
> >> > W0608 03:33:27.404191 24158 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_103' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> > W0608 03:33:30.907639 24159 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_97' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> > W0608 03:33:32.405814 24155 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_103' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> > W0608 03:33:35.909579 24164 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_97' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> > W0608 03:33:37.406793 24167 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_103' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> > W0608 03:33:40.910625 24164 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_97' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> > W0608 03:33:42.407618 24167 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_103' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> > W0608 03:33:45.911341 24165 monitor.cpp:167] Failed to collect
> resource
> >> > usage for executor 'executor_Task_Tracker_97' of framework
> >> > '201306071736-252063498-5050-19065-0000': Future discarded
> >> >
> >> > *At last, slave is blocked, and it never starts any executor.
> Although I
> >> > can see the log slave is launching executor, but I can not find any
> >> files
> >> > in the executor working directory.*
> >> > *
> >> > *
> >> > *Does this a known issue? I search the jira, but only find this. *
> >> >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3CJIRA.12646311.1367878134469.276019.1367878215622@arcas%3E
> >> > *
> >> > *
> >> > Thanks.
> >> >
> >> > Guodong
> >> >
> >>
> >
> >
>

Reply via email to