[ https://issues.apache.org/jira/browse/MESOS-9460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787423#comment-16787423 ]
Greg Mann commented on MESOS-9460: ---------------------------------- Sharing some learnings here after a couple difficult attempts at resolving this issue: The most recent reviews attempt to add a {{Sequence}} to the master which prevents problematic interleavings between multiple updates to the allocator/master state. Due to recent changes which added the concept of "orphan operations" (MESOS-9542), it has become difficult to accomplish this with per-agent {{Sequence}}s, without some messy refactoring of functions like {{recoverFramework()}}. It's fairly straightforward to accomplish a fix with a global {{Sequence}} in the master, but this seems undesirable since, for example, all calls to {{updateOperationStatus()}} would need to be sequenced. Since the orphan operation code is tech debt which can be removed once MESOS-9556 and MESOS-8582 are resolved, I'm hesitant to add more complexity to the code on top of the orphan operation handling. I think I would prefer to punt on the issue described in this ticket until those other issues are resolved, at which point we will be able to handle this one in a simpler way. > Speculative operations may make master and allocator resource views out of > sync. > -------------------------------------------------------------------------------- > > Key: MESOS-9460 > URL: https://issues.apache.org/jira/browse/MESOS-9460 > Project: Mesos > Issue Type: Bug > Components: agent, master > Affects Versions: 1.5.1, 1.6.1, 1.7.0 > Reporter: Meng Zhu > Assignee: Greg Mann > Priority: Major > Labels: foundations > > When speculative operations (RESERVE, UNRESERVE, CREATE, DESTROY) are issued > via the master operator API, the master updates the allocator state in > {{Master::apply()}}, and then later updates its internal state in > {{Master::_apply}}. This means that other updates to the allocator may be > interleaved between these two continuations, causing the master state to be > out of sync with the allocator state. > This bug could happen with the following sequence of events: > - agent (re)registers with the master > - multiple speculative operation calls are made to the master via the > operator API > - the allocator is speculatively updated in > https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L11326 > - before agent resource gets updated, it sends `UpdateSlaveMessage` when > getting the (re)registered message if it has the capability > `RESOURCE_PROVIDER` or oversubscription is used > (https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/slave/slave.cpp#L1560-L1566 > and > https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/slave/slave.cpp#L1643-L1648) > - as long as the first operation via the operator API has been added to the > {{Slave}} struct at this point, then the master won't hit [this block > here|https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L7940-L7945] > and the `UpdateSlaveMessage` triggers allocator to update the total > resources with STALE info from the {{Slave}} struct > [here|https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L8207], > thus the update from the previous operation is overwritten and LOST. Since > the {{Slave}} struct has not yet been updated, the allocator update at that > point uses stale resources from {{slave->totalResources}}. > - agent finishes the operation and informs the master through > `UpdateOperationStatusMessage` but for the speculative operation, we do not > update the allocator > https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/master/master.cpp#L11187-L11189 > - The resource views of the master/agent state and the allocator state are > now inconsistent > This caused MESOS-7971 and likely MESOS-9458 as well. > It's unclear how this can be fixed in a reliable way. It's possible that > ensuring that updates to the allocator state and the master state are > performed in a single synchronous block of code could work, but in the case > of operator-initiated operations this is difficult. It may also be possible > to ensure consistency by ensuring that every time such updates are done in > the master, the allocator is updated before the master state. > This ticket will be Done when a comprehensive solution for this issue is > designed. A subsequent ticket for actual implementation of that solution > should be filed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)