Crashing the agent is definitely not a viable option IMO.

Why can't we use agent capabilities instead of agent version and reject
such operations at master? This is one of the main reasons we introduced
the concept of framework, master, agent capabilities.

On Mon, Apr 16, 2018 at 2:04 PM, Chun-Hung Hsiao <chhs...@apache.org> wrote:

> Hi all,
>
> As some might have already known, we are currently working on patches to
> implement the new GROW_VOLUME and SHRINK_VOLUME operations [1].
>
> One problem surfaces is that, since the new operations are not supported in
> Mesos 1.5, they will lead to an agent crash during the operation
> application
> cycle if a Mesos 1.6 master send these operations to a Mesos 1.5 agent [2].
>
> We are now consider two possibilities to address this compatibility
> problem:
>
> 1) The Mesos 1.6 master should check the agent's Mesos version in
> `Master::accept` [3]. Moving forward, if we add new operations in future
> Mesos
> releases, we would have code like the following:
>
> ```
> Version slaveVersion = ...; // Get the Mesos version of the slave of the
> offer.
> switch (operation.type()) {
>   ...
>   case SOME_NEW_OPERATION: {
>     if (slaveVersion < minVersionForSomeNewOperation) {
>       ... // Drop the operation.
>     }
>     break;
>   }
>   ...
> }
> ```
>
> Pros and cons:
> + The new operation won't go into the operation application cycle since it
> is
>   rejected in the very beginning. This means no resource metadata is
> touched.
> - Explicit slave version checks at master side make the code look not very
> clean,
>   and we will need to update this list every time we add a new operation.
>
> 2) Treat this issue as an agent crash bug. The Mesos master would forward
> the operation to the agent, regardless of the agent's Mesos version. In the
> agent,
> we deploy and backport the following logic in `Slave::applyOperation` [4]:
>
> ```
> if (message.operation_info().type() == OPERATION_UNKNOWN) {
>   ... // Drop the operation and trigger a re-registration or send an
>       // `UpdateSlaveMessage` to force the master to update the total
> resource of
>       // the slave.
> }
> ```
>
> Pros and cons:
> + Easier to add new operations since no new logic needs to be added for
> backward
>   Compability.
> - Since the old agent won't know whether the new operations are speculative
> or not,
>   a re-registration or an `UpdateSlaveMessage` is required.
> - Mesos 1.5.0 agents will still have the bug and crash when a new master
> sends a
>   new operation to them.
>
> Since both options are viable and there seems to be no clear winner, we'd
> like to
> check with the community to see which convention is preferable. Please let
> us know
> what you think. Thanks!
>
> Best,
> Chun-Hung
>
>
> [1] https://issues.apache.org/jira/browse/MESOS-4965
> [2]
> https://github.com/apache/mesos/blob/1.5.x/src/common/protob
> uf_utils.cpp#L851
> [3] https://github.com/apache/mesos/blob/master/src/master/maste
> r.cpp#L3899
> [4] https://github.com/apache/mesos/blob/1.5.x/src/slave/slave.cpp#L4359
>

Reply via email to