[ 
https://issues.apache.org/jira/browse/MESOS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749003#comment-16749003
 ] 

Meng Zhu commented on MESOS-8850:
---------------------------------

After further investigation, it turns out that for the race it to manifest as 
the check failure in MESOS-8778, there is actually another “bug” in the sorter.

In the [`update` 
function|https://github.com/apache/mesos/blob/ea824dca48d1dc839b41fd9fdfff71f2673c40aa/src/master/allocator/sorter/drf/sorter.hpp#L387-L418],
 when we subtract `oldResources`, we do not do shared resource uniqueness 
check, in comparison the [`subtract` 
function|https://github.com/apache/mesos/blob/ea824dca48d1dc839b41fd9fdfff71f2673c40aa/src/master/allocator/sorter/drf/sorter.hpp#L363-L371]
 above which does.

Before update: resource == 100 disk (sharedCount = 2); quantity == 100
Update: convert 100 shared to 100 normal disk (i.e. destroy)
After update: resource == 100 disk (shared == 1), 100 disk (normal); quantity = 
100

Now the full `resources` have diverged with the kept resource quantities, that 
is why the quantity check failed but resources did not.

Here, we are updating `resource == 100 disk (sharedCount = 2)` to `100 disk 
normal`, i.e. we are destroying the shared disk while it is still being shared 
which is the original culprit.

So if we fix the `update` function, even with the race condition the sorter 
will be fine (will not hit the check error), but we will likely hit another 
error somewhere.

We could consider checking in the update function that there cannot be any 
conversion unless sharedCount == 1.

> Race between master and allocator when destroying shared volume could lead to 
> sorter check failure.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-8850
>                 URL: https://issues.apache.org/jira/browse/MESOS-8850
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation, master
>            Reporter: Meng Zhu
>            Priority: Major
>
> When destroying shared volume, master first rescinds offers that contain the 
> shared volume and then apply the destroy operation. This process involves 
> interaction between the master and allocator actor. The following race could 
> arise:
> 1. Framework1 and framework2 are each offered a shared disk;
> 2. Framework2 asks the master to destroy the shared disk;
> 3. Master rescinds framework1's offer that contains the shared disk;
> 4. `allocator->recoverResources` is called to recover framework1’s offered 
> resources in the allocator;
> 5. [Race] Allocator shortly allocates resources to framework1. The allocation 
> contains the shared disk that just got recovered which has not been destroyed 
> at the moment. Allocator invokes `offerCallback` which dispatches to the 
> master;
> 6. Master continues the destroy operation and calls 
> `allocator->updateAllocation` to notify the allocator to transform the shared 
> disk to regular reserved disk;
> 7. Master processes the `offerCallback` dispatched in step5 and offered the 
> shared disk to framework1.
> At this point, the same disk resource appears in two different places: one 
> shared offered to framework1, one not shared currently hold by framework2 
> (soon to be recovered).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to