[ 
https://issues.apache.org/jira/browse/AIRFLOW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277884#comment-15277884
 ] 

Amikam Snir edited comment on AIRFLOW-47 at 5/10/16 10:07 AM:
--------------------------------------------------------------

[~hilaviz], Please open a new issue/ edit the description. 
The problem is that you got a dead-lock. 
The Daily DAG instances consume all the resources. The Daily is depend on the 
hourly, but the resources (e.g. workers) are already occupied by the Daily 
instances. 
DAG that wait for something to happen e.g. external DAG, should signal the 
scheduler (voluntarily giving up it's turn). 
Those instances should move to the waiting queue similar to the OS scheduling 
queues concept.



was (Author: asnir):
@hilaviz, Please open a new issue/ edit the description. 
The problem is that you got a dead-lock. 
The Daily DAG instances consume all the resources. The Daily is depend on the 
hourly, but the resources (e.g. workers) are already occupied by the Daily 
instances. 
DAG that wait for something to happen e.g. external DAG, should signal the 
scheduler (voluntarily giving up it's turn). 
Those instances should move to the waiting queue similar to the OS scheduling 
queues concept.


> ExternalTaskSensor causes scheduling dead lock
> ----------------------------------------------
>
>                 Key: AIRFLOW-47
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-47
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators, scheduler
>    Affects Versions: Airflow 1.7.0
>         Environment: CentOS 6.5
> Airflow 1.7.0 with SequentialExecuter 
>            Reporter: Hila Visan
>            Priority: Trivial
>
> We are trying to use 'ExternalTaskSensor' to coordinate between a daily DAG 
> and an hourly DAG  (daily dags  depend on hourly).
> Relevant code: 
> *Daily DAG definition:*
> {code:title=2_daily_dag.py|borderStyle=solid}
> default_args = {
>     …
>     'start_date': datetime(2016, 4, 2),
>     …
> }
> dag = DAG(dag_id='2_daily_agg', default_args=default_args, 
> schedule_interval="@daily")
> ext_dep = ExternalTaskSensor(
>     external_dag_id='1_hourly_agg',
>     external_task_id='print_hourly1',
>     task_id='evening_hours_sensor',
>     dag=dag)
> {code}
> *Hourly DAG definition:*
> {code:title=1_hourly_dag.py|borderStyle=solid}
> default_args = {
>     …
>     'start_date': datetime(2016, 4, 1),
>     …
> }
> dag = DAG(dag_id='1_hourly_agg', default_args=default_args, 
> schedule_interval="@hourly")
> t1 = BashOperator(
>     task_id='print_hourly1',
>     bash_command='echo hourly job1',
>     dag=dag)
> {code}
> The hourly dag was executed twice for the following execution dates:
> 04-01T00:00:00        
> 04-01T01:00:00
> Then the daily dag was executed, and is still running....      
> According to logs, daily dag is waiting for hourly dag to complete:
> {noformat}
> [2016-05-04 06:01:20,978] {models.py:1041} INFO - 
> Executing<Task(ExternalTaskSensor): evening_hours_sensor> on 2016-04-03 
> 00:00:00
> [2016-05-04 06:01:20,984] {sensors.py:188} INFO - Poking for 
> 1_hourly_agg.print_hourly1 on 2016-04-02 00:00:00 ... 
> [2016-05-04 06:02:21,053] {sensors.py:188} INFO - Poking for 
> 1_hourly_agg.print_hourly1 on 2016-04-02 00:00:00 ... }}
> {noformat}
> How can I solve this dead-lock?
> In Addition- I didn't understand if it means that the daily dag depends only 
> on the "last" hourly dag of the same day (23-24pm)? 
> What happens if the hourly dag of other hour fails?
> Thanks a lot! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to