[
https://issues.apache.org/jira/browse/FLINK-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann reassigned FLINK-14158:
-------------------------------------
Assignee: Piyush Narang
> Update Mesos configs to add leaseOfferExpiration and declinedOfferRefuse
> durations
> ----------------------------------------------------------------------------------
>
> Key: FLINK-14158
> URL: https://issues.apache.org/jira/browse/FLINK-14158
> Project: Flink
> Issue Type: Bug
> Reporter: Piyush Narang
> Assignee: Piyush Narang
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> While debugging some Flink on Mesos scheduling issues (tied to our use of
> Mesos quotas) we end up getting skewed offers that are useless fairly often.
> As we are not rejecting these offers fast enough and as we are not telling
> Mesos to not re-send for a long enough period, we end up not being able to
> schedule our job for upwards of an hour (~30 Mesos containers).
> The Fenzo default is to reject expired and unused Mesos offers after 120s,
> this can be overridden using their TaskScheduler builder. Additionally, Mesos
> allows us to override the time for which it won't re-send offers (default is
> 5s). We found that updating to reject more aggressively (every 1s instead of
> 120s) and keeping rejected offers away for longer (60s instead of 5s)
> dramatically increases our chances of scheduling our jobs on Mesos.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)