[ 
https://issues.apache.org/jira/browse/AIRFLOW-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6388:
--------------------------
    Description: 
Spark jobs can often take many minutes (or even hours) to complete. The spark 
submit operator submits a job to a spark cluster, then polls its status. This 
means it could be consuming a 'slot' (ie parallelism, dag_concurrency, 
max_active_dag_runs_per_dag, non_pooled_task_slot_count) for hours when it is 
not 'doing' anything but polling for status. 
https://github.com/apache/airflow/pull/6909#discussion_r361838225 suggested it 
should move to a poke/reschedule model.

"This actually means occupy worker and do nothing for n seconds is it not?
It was OK when it was 1 second but users may set it to even 5 min without 
realising that it occupys the worker.

My comment here is more of a concern rather than an action to do.
Should this work by occupying the worker "indefinitely" or can it be something 
like the sensors with (poke/reschedule)?"

  was:
My Dag has tasks from 12 different types of operators. One of the operators is 
the dummyoperator (which is meant to do 'nothing') but it can't be run during 
busy times as the '{{parallelism}}, {{dag_concurrency}}, 
{{max_active_dag_runs_per_dag}}, {{non_pooled_task_slot_count' }}limits have 
been met (so it is stuck in scheduled state). I would like a new config flag 
(dont_block_dummy=True) with the ability for dummyOperator tasks to always get 
run even if the parallelism.etc limits are met. Without this feature, the only 
workaround for this is to make a huge parallelism limit (above now) and then 
give pools to all the other operators in my dag. But my idea is that 
dummyOperator should not have limits as it is not a resource hog.

 
h4. Task Instance Details
h5. Dependencies Blocking Task From Getting Scheduled
||Dependency||Reason||
|Unknown|All dependencies are met but the task instance is not running. In most 
cases this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load
- The following configuration values may be limiting the number of queueable 
processes: {{parallelism}}, {{dag_concurrency}}, 
{{max_active_dag_runs_per_dag}}, {{non_pooled_task_slot_count}}
 
If this task instance does not start soon please contact your Airflow 
administrator for assistance.|


> SparkSubmitOperator polling should not 'consume' a slot
> -------------------------------------------------------
>
>                 Key: AIRFLOW-6388
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6388
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: dependencies, scheduler
>    Affects Versions: 1.10.3
>            Reporter: t oo
>            Priority: Minor
>
> Spark jobs can often take many minutes (or even hours) to complete. The spark 
> submit operator submits a job to a spark cluster, then polls its status. This 
> means it could be consuming a 'slot' (ie parallelism, dag_concurrency, 
> max_active_dag_runs_per_dag, non_pooled_task_slot_count) for hours when it is 
> not 'doing' anything but polling for status. 
> https://github.com/apache/airflow/pull/6909#discussion_r361838225 suggested 
> it should move to a poke/reschedule model.
> "This actually means occupy worker and do nothing for n seconds is it not?
> It was OK when it was 1 second but users may set it to even 5 min without 
> realising that it occupys the worker.
> My comment here is more of a concern rather than an action to do.
> Should this work by occupying the worker "indefinitely" or can it be 
> something like the sensors with (poke/reschedule)?"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to