[
https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240148#comment-16240148
]
Alexander Rukletsov commented on MESOS-7966:
--------------------------------------------
[~robjohnson] do you still have master logs?
> check for maintenance on agent causes fatal error
> -------------------------------------------------
>
> Key: MESOS-7966
> URL: https://issues.apache.org/jira/browse/MESOS-7966
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 1.1.0
> Reporter: Rob Johnson
> Assignee: Armand Grillet
> Priority: Blocker
> Labels: reliability
>
> We interact with the maintenance API frequently to orchestrate gracefully
> draining agents of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with
> the api. This happens relatively frequently, and impacts us when downstream
> frameworks (marathon) react badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed:
> slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're
> happy to provide any other logs you need - please let me know what would be
> useful for debugging.
> Thanks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)