[
https://issues.apache.org/jira/browse/AURORA-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brian Hatfield updated AURORA-1579:
-----------------------------------
Summary: Allow preflight-check of Job schedulability (was: Allow
preflight-check of Job schedulability.)
> Allow preflight-check of Job schedulability
> -------------------------------------------
>
> Key: AURORA-1579
> URL: https://issues.apache.org/jira/browse/AURORA-1579
> Project: Aurora
> Issue Type: Task
> Components: Client, Scheduler
> Reporter: Brian Hatfield
> Priority: Minor
>
> The goal of this feature is to allow users to check if their job (as
> configured) would likely be schedulable given Aurora's current offers. An
> extended form of this feature would be able to perform this test while
> assuming any current instance of the job in question would be stopped.
> Here is the suggestion I sent to the mailing list describing my use-case for
> such a feature:
> {quote}
> We currently run a (relatively) small Mesos/Aurora cluster, and don't always
> have significant resource overhead available.
> Sometimes, we go to schedule a job and we're just short of what we
> estimated-by-hand we'd need in the cluster for it. Most of the tasks schedule
> - but a few stay "PENDING" because of the resource constraint. This often
> confuses users, or in some cases, causes the command to block for a while
> until it eventually times out.
> We're currently working in-house on automating somewhat-more-precise basic
> estimation with information sourced from /offers to get a sense of "nope,
> your task won't schedule" to provide fast feedback that doesn't manipulate
> the state of the cluster.
> However, our basic estimation doesn't include co-scheduling constraints,
> quotas, etc., which seem like something Aurora would be able to determine.
> {quote}
> It is worth noting that this kind of feature is inherently subject to race
> conditions and future restrictions. Somewhat paradoxically, this feature is
> more useful the smaller your quota or cluster is, as many actions in a
> restricted environment will require adding capacity (or quota). It is worth
> documenting this feature to mention that there are cases where your tasks
> could still end up pending - losing a race, host failure, "oddly shaped
> tasks" failing to reschedule, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)