[ 
https://issues.apache.org/jira/browse/AIRFLOW-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309206#comment-16309206
 ] 

Pratap Naik commented on AIRFLOW-192:
-------------------------------------

I think setting the priority to -1 for all tasks does the trick...

> Implement priority_weight aggregation using ancestors (rather than successors)
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-192
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-192
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 1.7.1.2
>            Reporter: Sergei Iakhnin
>
> Currently tasks are being scheduled based on the priority_weight. The 
> effective priority of a task is it's own priority plus the priorities of all 
> tasks that follow it in a dag. This results in undesirable scheduling 
> behaviour in my use case.
> My use case involves running scientific workflows where a number of 
> operations are being carried out on a set of samples in a set. Each sample is 
> handled by a separate dag run that is manually triggered. It is common for 
> several thousand dag instances to be in flight at a given time. The dag 
> reserves a sample, operates on it, and then releases it. I would like for 
> each sample to be reserved for as short a time as possible, so that other 
> programs can have an opportunity to operate on it and dag runs can complete 
> as fast as possible. However, because of the current priority logic, if I 
> were to schedule several thousand dags at a given time, they would first all 
> execute their first state, then all execute their second state, etc. Thus, no 
> dag can complete fully, until all dags complete their second last state. This 
> results in unnecessarily long dag run times and simultaneous completion of 
> all dags.
> Ideally, Airflow would support the reverse of the current logic used for 
> priorities i.e. a task's priority is the sum of priorities of all its 
> ancestors. This way, the further along a dag is in its processing the more 
> likely its tasks will get scheduled (thus leading to a shorter completion 
> time, and release of its resources).
> Also, a nominal priority mode would be useful, where a task's priority is 
> exactly the number given to it by the author, in order to allow more 
> scheduling flexibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to