Michael Gummelt created MESOS-6112:
--------------------------------------

             Summary: Frameworks are starved when > 5 are run concurrently
                 Key: MESOS-6112
                 URL: https://issues.apache.org/jira/browse/MESOS-6112
             Project: Mesos
          Issue Type: Task
          Components: master
    Affects Versions: 1.0.1
            Reporter: Michael Gummelt


As I understand it, the master will send an offer to a list of frameworks 
ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
each offering.  Once the decline timeout for the first framework has been 
reached, rather than continuing to submit the offer to the rest of the 
frameworks in the list, the master starts over at the beginning, starving the 
rest of the frameworks.

This means that in order for Mesos to support > 5 concurrent frameworks, all 
frameworks must be good citizens and set their decline timeout to something 
large or suppress offers.  I think this is a fairly undesirable state of things.

I propose that the master instead continues to submit the offer to every 
registered framework, even if the declineOffer timeout has been reached.

The potential increase in task startup latency that could be introduced by this 
change can be obviated in part if we also make the master smarter about how 
long to wait between successive offers, rather than a static 1s.

  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to