[ 
https://issues.apache.org/jira/browse/MESOS-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511324#comment-15511324
 ] 

Joseph Wu commented on MESOS-6221:
----------------------------------

The race is definitely a possibility.  The current implementation of 
maintenance primitives is an MVP.  We're waiting for some community 
adoption/feedback (particularly framework support) before hardening the feature 
further.  

As part of the MVP, we decided it would be logically simpler to assume only one 
operator does any maintenance, including changing the schedule.  (Note that 
there are TODOs in the codebase about having multiple schedules.  That would be 
one way of isolating two operators.)

> Ability to post maintenance/schedule with better granularity
> ------------------------------------------------------------
>
>                 Key: MESOS-6221
>                 URL: https://issues.apache.org/jira/browse/MESOS-6221
>             Project: Mesos
>          Issue Type: Improvement
>          Components: HTTP API
>            Reporter: Huadong Liu
>
> Currently the maintenance schedule update is at cluster granularity: "To 
> update the maintenance schedule, the operator should first read the current 
> schedule, make any necessary changes, and then post the modified schedule." 
> http://mesos.apache.org/documentation/latest/maintenance/
> In contrast, the machine/down and up endpoints operate at host granularity. 
> One or a set of hosts can be moved to DOWN mode or UP mode once the schedule 
> exists.
> Requiring to GET current schedule before POSTing an updated schedule may 
> create races if machine/up and maintenance/schedule update happen at 
> different hosts/processes, for example.
> 1. mesos master has host A in maintenance down mode.
> 2. process p1 tries to UP host A.
> 3. process p2 tries to get the current schedule and then append host B to the 
> schedule.
> 4. mesos master may end up have A and B in maintenance DRAIN mode although 
> the desired result is to have B in DRAIN mode only.
> I cannot find a document to explain why the maintenance schedule has to be 
> updated at the cluster granularity. Although the problem can be resolved by 
> external synchronization, having the ability to update maintenance schedule 
> at hosts granularity seems a better choice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to