[jira] [Updated] (MESOS-6445) Reconciliation for unreachable agent after master failover is incorrect

Vinod Kone (JIRA) Tue, 25 Oct 2016 12:23:49 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vinod Kone updated MESOS-6445:
------------------------------
    Fix Version/s:     (was: 1.2.0)
                   1.1.0

Cherry-picked for 1.1.0.

commit e1b6b2145a82a0f21aed82fc335ea09965b2f2dc
Author: Vinod Kone <[email protected]>
Date:   Tue Oct 25 12:20:39 2016 -0700

    Added MESOS-6445 to CHANGELOG for 1.1.0.

commit c6516be5df87e3fc8bea67f4dc74bc6a4743147d
Author: Neil Conway <[email protected]>
Date:   Fri Oct 21 14:18:59 2016 -0700

    Tweaked test expectation.
    
    `WillOnce` is more accurate than `WillRepeatedly`.
    
    Review: https://reviews.apache.org/r/53098/

commit 0abd9510a0dba87d1a791d8751e5ccdbb02784db
Author: Neil Conway <[email protected]>
Date:   Fri Oct 21 14:18:52 2016 -0700

    Fixed bug when marking agents unreachable after master failover.
    
    If the master fails over and an agent does not re-register within the
    `agent_reregister_timeout`, the master marks the agent as unreachable in
    the registry and sends `slaveLost` for it. However, we neglected to
    update the master's in-memory state for the newly unreachable agent;
    this meant that task reconciliation would return incorrect results
    (until/unless the next master failover).
    
    Review: https://reviews.apache.org/r/53097/

commit 8cf2ca8703d3b776fdbdaac2979cbd3ea40873ad
Author: Neil Conway <[email protected]>
Date:   Fri Oct 21 14:18:46 2016 -0700

    Avoided passing `TimeInfo` by value.
    
    Although this is likely to remain small in practice, passing by const
    reference should be preferred until there is a reason not to.
    
    Review: https://reviews.apache.org/r/53099/



> Reconciliation for unreachable agent after master failover is incorrect
> -----------------------------------------------------------------------
>
>                 Key: MESOS-6445
>                 URL: https://issues.apache.org/jira/browse/MESOS-6445
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>            Priority: Blocker
>              Labels: mesosphere
>             Fix For: 1.1.0
>
>
> {noformat}
>     If the master fails over and an agent does not re-register within the
>     `agent_reregister_timeout`, the master marks the agent as unreachable in
>     the registry and sends `slaveLost` for it. However, we neglected to
>     update the master's in-memory state for the newly unreachable agent;
>     this meant that task reconciliation would return incorrect results
>     (until/unless the next master failover).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6445) Reconciliation for unreachable agent after master failover is incorrect

Reply via email to