[jira] [Commented] (MESOS-9277) UNRESERVE scheduler call be dropped if it loses the race with TEARDOWN.

Benjamin Mahler (JIRA) Mon, 15 Oct 2018 01:03:30 -0700


    [ 
https://issues.apache.org/jira/browse/MESOS-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649866#comment-16649866
 ]


Benjamin Mahler commented on MESOS-9277:
----------------------------------------

Are you referring to the event stream?

This ticket is about scheduler::Call sequencing, each call is an http request. 
Solution 2 is only applicable to the case where the scheduler uses http 
pipelining for its Call requests. I do think some sequencing makes sense, and 
we added sequencing in the http server itself to prevent re-ordering across the 
authentication and authorization boundaries to protect against an authenticator 
or authorizer that re-orders.

However, I think in the example provided in this ticket (unreserve then tear 
down), sequencing is not the answer. To enable reliable completion of 
operations prior to tearing down, there needs to be feedback. Alternatively, we 
can tie more objects (reservations in this case) to the framework, so that we 
can clean up. (However, that's tough in the case that the scheduler leaves a 
dangling volume, we would need a tombstone mechanism instead of just deleting 
it).

cc [~gkleiman] [~greggomann]

> UNRESERVE scheduler call be dropped if it loses the race with TEARDOWN. 
> ------------------------------------------------------------------------
>
>                 Key: MESOS-9277
>                 URL: https://issues.apache.org/jira/browse/MESOS-9277
>             Project: Mesos
>          Issue Type: Bug
>          Components: scheduler api
>    Affects Versions: 1.5.1, 1.6.1, 1.7.0
>            Reporter: Alexander Rukletsov
>            Priority: Major
>              Labels: mesosphere, v1_api
>
> A typical use pattern for a framework scheduler is to remove its reservations 
> before tearing itself down. However, it is racy: {{UNRESERVE}} is a 
> multi-stage action which aborts if the framework is removed in-between.
> *Solution 1*
> Let schedulers use operation feedback and expect them to wait for an ack for 
> {{UNRESERVE}} before they send {{TEARDOWN}}. Kind of science fiction with a 
> timeline of {{O(months)}} and still possibilities for the race if a scheduler 
> does not comply.
> *Solution 2*
> Serialize calls for schedulers. For example, we can chain [handlers 
> here|https://github.com/apache/mesos/blob/6e21e94ddca5b776d44636fe3eba8500bf88dc25/src/master/http.cpp#L640-L711]
>  onto per-{{Master::Framework}} 
> [{{process::Sequence}}|https://github.com/apache/mesos/blob/6e21e94ddca5b776d44636fe3eba8500bf88dc25/3rdparty/libprocess/include/process/sequence.hpp].
>  For that however, handlers must provide futures indicating when the 
> processing of the call is finished, note that most [handlers 
> here|https://github.com/apache/mesos/blob/6e21e94ddca5b776d44636fe3eba8500bf88dc25/src/master/http.cpp#L640-L711]
>  return void.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (MESOS-9277) UNRESERVE scheduler call be dropped if it loses the race with TEARDOWN.

Reply via email to