[ https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525366#comment-16525366 ]
Vinod Kone commented on MESOS-7966: ----------------------------------- [~kaysoky] Was this backported to older supported and affected versions as well? If not, shouldn't we? > check for maintenance on agent causes fatal error > ------------------------------------------------- > > Key: MESOS-7966 > URL: https://issues.apache.org/jira/browse/MESOS-7966 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 1.1.3, 1.2.3, 1.3.2, 1.4.1, 1.5.0, 1.6.0 > Reporter: Rob Johnson > Assignee: Benno Evers > Priority: Critical > Labels: mesosphere, reliability > Fix For: 1.7.0 > > > We interact with the maintenance API frequently to orchestrate gracefully > draining agents of tasks without impacting service availability. > Occasionally we seem to trigger a fatal error in Mesos when interacting with > the api. This happens relatively frequently, and impacts us when downstream > frameworks (marathon) react badly to leader elections. > Here is the log line that we see when the master dies: > {code} > F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: > slaves[slaveId].maintenance.isSome() > {code} > It's quite possibly we're using the maintenance API in the wrong way. We're > happy to provide any other logs you need - please let me know what would be > useful for debugging. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)