[ 
https://issues.apache.org/jira/browse/AIRFLOW-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6228:
--------------------------
    Description: Right now each task instance can consume 1 slot inside a pool, 
but some tasks are bigger/smaller than others. For tasks that I know are 'big' 
i want to be able to say consume say 4 slots from a pool  (was: Right now only 
a single pool name can be assigned to each task instance.

Ideally 2 different pool names can be assigned to a task_instance.

Use case:

I have 300 Spark tasks writing to 60 different tables (ie. there are multiple 
tasks writing to same table).

I want both:
 # Maximum of 30 Spark tasks running in parallel
 # Never more than 1 Spark task writing to the same table in parallel

If i have a 'spark' pool of 30 and assign 'spark' pool to those tasks then i 
risk having 2 tasks writing to same table.

But instead if i have a 'tableA' pool of 1, 'tableB' pool of 1, 'tableC' pool 
of 1...etc and assign relevant table name pool to each task then i risk having 
more than 30 spark tasks running in parallel.

I can't use 'parallelism' or other settings because I have other non-spark 
tasks that I don't want to limit

 

 )

> Ability for a single task to consume more than 1 slot of a pool
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-6228
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6228
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Priority: Major
>
> Right now each task instance can consume 1 slot inside a pool, but some tasks 
> are bigger/smaller than others. For tasks that I know are 'big' i want to be 
> able to say consume say 4 slots from a pool



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to