Sagar Sadashiv Patwardhan created MESOS-7882:

             Summary: Mesos master rescinds all the in-flight offers from all 
the registered agents when a new maintenance schedule is posted for a subset of 
                 Key: MESOS-7882
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 1.3.0
         Environment: Ubuntu 14:04(trusty)
Mesos master branch.
SHA: a31dd52ab71d2a529b55cd9111ec54acf7550ded
            Reporter: Sagar Sadashiv Patwardhan
            Priority: Minor

We are running mesos 1.1.0 in production. We use a custom autoscaler for 
scaling our mesos  cluster up and down. While scaling down the cluster, 
autoscaler makes a POST request to mesos master /maintenance/schedule endpoint 
with a set of slaves to move to maintenance mode. This forces mesos master to 
rescind all the in-flight offers from *all the slaves* in the cluster. If our 
scheduler accepts one of these offers, then we get a TASK_LOST status update 
back for that task. We also see such 
( log lines 
in mesos master logs.

After reading the code(refs:, it 
appears that offers are getting rescinded for all the slaves. I am not sure 
what is the expected behavior here, but it makes more sense if only resources 
from slaves marked for maintenance are reclaimed.

To verify that it is actually happening, I checked out the master branch(sha: 
a31dd52ab71d2a529b55cd9111ec54acf7550ded ) and added some log 
Built the binary and started a mesos master and 2 agent processes. Used a basic 
python framework that launches docker containers on these slaves. Verified that 
there is no existing schedule for any slaves using `curl`. Posted maintenance schedule for one of 
the slaves( 
after starting the mesos framework.

Mesos framework:

I think mesos should rescind offers and inverse offers only for those slaves 
that are marked for maintenance(draining mode).

This message was sent by Atlassian JIRA

Reply via email to