[
https://issues.apache.org/jira/browse/MESOS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708723#comment-16708723
]
Benjamin Bannier commented on MESOS-9448:
-----------------------------------------
Thanks for the additional details, [~gkleiman].
With the current approach we expect the master's HTTP handler to have enough
information to assemble a response immediately, i.e., without deferring to the
agent or and resource provider managers. If one wanted to say send different
operation status for operations on (1) active, but currently unsubscribed
resource providers, and on (2) removed resource providers, we would need to
sync at least resource providers ever active in the cluster to the master.
Currently master and agent communicate via {{UpdateSlaveMessage}} which is
about_active providers_ and their operations, but not about all providers (both
present and past) which the master would need to distinguish a disconnected
provider from a removed one. Explicitly communicating that information to the
master seems wasteful (after all, a resource provider manager would have this
information already) and potentially not scalable (e.g., to many agents with a
lot of provider churn).
Currently master sends {{OPERATION_UNKNOWN}} for any resource provider it has
not yet seen which is too coarse-grained for frameworks, see MESOS-9318. It
seems that the current semantics impose a huge cost on improving that.
All this would seem much simpler in a world were a call to reconcile operations
would trigger asynchronously triggered operation status update events from all
involved entities (i.e., agents, and resource provider managers). Here master
would defer the work to the entity actually managing that state.
> Semantics of RECONCILE_OPERATIONS framework API call are incorrect
> ------------------------------------------------------------------
>
> Key: MESOS-9448
> URL: https://issues.apache.org/jira/browse/MESOS-9448
> Project: Mesos
> Issue Type: Bug
> Components: framework, HTTP API, master
> Reporter: Benjamin Bannier
> Priority: Major
>
> The typical pattern in the framework HTTP API is that frameworks send calls
> to which the master responds with {{Accepted}} responses and which trigger
> events. The only designed exception to this are {{SUBSCRIBE}} calls to which
> the master responds with an {{Ok}} response containing the assigned framework
> ID. This is even codified in {{src/scheduler.cpp:646ff}},
> {code}
> if (response->code == process::http::Status::OK) {
> // Only SUBSCRIBE call should get a "200 OK" response.
> CHECK_EQ(Call::SUBSCRIBE, call.type());
> {code}
> Currently, the handling of {{RECONCILE_OPERATIONS}} calls does not follow
> this pattern. Instead of sending events, the master immediately responds with
> a {{Ok}} and a list of operations. This e.g., leads to assertion failures in
> above hard check whenever one uses the {{Scheduler::send}} instead of
> {{Scheduler::call}}. One can reproduce this by modifying the existing tests
> in {{src/operation_reconciliation_tests.cpp}},
> {code}
> mesos.send({createCallReconcileOperations(frameworkId, {operation})}); // ADD
> THIS.
> const Future<scheduler::APIResult> result =
> mesos.call({createCallReconcileOperations(frameworkId, {operation})});
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)