[
https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496309#comment-16496309
]
Matthew Mead-Briggs commented on MESOS-7966:
--------------------------------------------
This is great sleuthing!
Probably of note here is that for PaaSTA we do use dynamic reservations via the
API to attempt to prevent tasks getting scheduled on maintenanced hosts. I'm
actually looking at a way to change how we do this but the rough idea of how we
do it now is:
* mark host for maintenance
* reserve all the resources with a dummy role
* paasta scales up affected marathon apps and kills off tasks on the affected
host
* after each task is killed we reserve the resources we've just freed up
I wasn't aware that Marathon had its own reasons for doing dynamic
reservations. Do you have any details you can share on why it does or a link to
some code?
> check for maintenance on agent causes fatal error
> -------------------------------------------------
>
> Key: MESOS-7966
> URL: https://issues.apache.org/jira/browse/MESOS-7966
> Project: Mesos
> Issue Type: Bug
> Components: master
> Affects Versions: 1.1.0
> Reporter: Rob Johnson
> Assignee: Benno Evers
> Priority: Critical
> Labels: mesosphere, reliability
>
> We interact with the maintenance API frequently to orchestrate gracefully
> draining agents of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with
> the api. This happens relatively frequently, and impacts us when downstream
> frameworks (marathon) react badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed:
> slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're
> happy to provide any other logs you need - please let me know what would be
> useful for debugging.
> Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)