[ 
https://issues.apache.org/jira/browse/MESOS-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-875:
----------------------------------

    Description: 
This is a regression due to the bug fix for MESOS-732: 
https://reviews.apache.org/r/14616/

Now that slave recovery is asynchronous, status updates coming from the 
executors will be ignored since the slave does not know about the framework 
until recovery is completed.

Example:

I1210 20:06:51.633050 54429 slave.cpp:1756] Handling status update 
TASK_FINISHED (UUID: foo) for task T of framework F from executor(1)@IP:PORT
W1210 20:06:51.633128 54429 slave.cpp:1766] Ignoring status update 
TASK_FINISHED (UUID: foo) for task T of framework F for unknown framework F

  was:
This is a regression due to the bug fix for MESOS-732: 
https://reviews.apache.org/r/14616/

Now that slave recovery is asynchronous, status updates coming from the 
executors will be ignored since the slave does not know about the framework 
until recovery is completed.

It seems that if an executor is sending updates in between disconnected() and 
reregistered(), these could be dropped anyway.

Example:

I1210 20:06:51.633050 54429 slave.cpp:1756] Handling status update 
TASK_FINISHED (UUID: foo) for task T of framework F from executor(1)@IP:PORT
W1210 20:06:51.633128 54429 slave.cpp:1766] Ignoring status update 
TASK_FINISHED (UUID: foo) for task T of framework F for unknown framework F


> A recovering slave should not ignore valid status updates.
> ----------------------------------------------------------
>
>                 Key: MESOS-875
>                 URL: https://issues.apache.org/jira/browse/MESOS-875
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.16.0
>            Reporter: Benjamin Mahler
>            Assignee: Vinod Kone
>            Priority: Critical
>             Fix For: 0.17.0
>
>
> This is a regression due to the bug fix for MESOS-732: 
> https://reviews.apache.org/r/14616/
> Now that slave recovery is asynchronous, status updates coming from the 
> executors will be ignored since the slave does not know about the framework 
> until recovery is completed.
> Example:
> I1210 20:06:51.633050 54429 slave.cpp:1756] Handling status update 
> TASK_FINISHED (UUID: foo) for task T of framework F from executor(1)@IP:PORT
> W1210 20:06:51.633128 54429 slave.cpp:1766] Ignoring status update 
> TASK_FINISHED (UUID: foo) for task T of framework F for unknown framework F



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to