[ 
https://issues.apache.org/jira/browse/MESOS-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363396#comment-15363396
 ] 

Adam B commented on MESOS-5693:
-------------------------------

Could be that the agent was not in contact with the master to be able to 
forward the update, but that's not possible for a whole hour. More likely the 
scheduler was disconnected for a long time, and since the scheduler never 
acknowledged the previous status update (TASK_RUNNING?), the agent never sent 
the next update in the queue. In order for Mesos to provide guaranteed 
at-least-once delivery of status updates to the schedulers, the scheduler must 
be connected to ACK each update.

> slave delay to forword status update
> ------------------------------------
>
>                 Key: MESOS-5693
>                 URL: https://issues.apache.org/jira/browse/MESOS-5693
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>    Affects Versions: 0.22.1
>         Environment: debian7 
>            Reporter: zhangfuxing
>
> we observe that mesos slave delay to forward task status update to master, 
> I0615 14:59:10.997902  3890 slave.cpp:2531] Handling status update 
> TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task 
> xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 from 
> executor(1)@10.0.40.189:54304
> I0615 14:59:11.001126  3895 status_update_manager.cpp:317] Received status 
> update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task 
> xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001
> I0615 14:59:11.001174  3895 status_update_manager.hpp:346] Checkpointing 
> UPDATE for status update TASK_KILLED (UUID: 
> 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 
> 20150629-151659-3355508746-5060-6173-0001
> I0615 14:59:11.037376  3894 slave.cpp:2709] Sending acknowledgement for 
> status update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for 
> task xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 to 
> executor(1)@10.0.40.189:54304
> I0615 15:54:21.352087  3888 slave.cpp:2776] Forwarding the update TASK_KILLED 
> (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of 
> framework 20150629-151659-3355508746-5060-6173-0001 to master@10.0.1.200:5060
> for this example, the task xxx.64554b80 has been killed at 14:59 but the 
> status didn't forward to master until 15:54



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to