[
https://issues.apache.org/jira/browse/AIRFLOW-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878486#comment-16878486
]
ASF subversion and git services commented on AIRFLOW-4510:
----------------------------------------------------------
Commit 75872a2c41e18cdabed45dc7c2ecf5f513c14d3d in airflow's branch
refs/heads/master from Abhishek Ray
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=75872a2 ]
[AIRFLOW-4510] Don't mutate default_args during DAG initialization (#5277)
While initializing a DAG, default_args is being mutated. If another
DAG is created in the same file with the same default_args, it gets
initialized with the incorrect Timezone information.
> Timezone set incorrectly if multiple DAGs defined in the same file
> ------------------------------------------------------------------
>
> Key: AIRFLOW-4510
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4510
> Project: Apache Airflow
> Issue Type: Bug
> Components: DAG
> Affects Versions: 1.10.3
> Reporter: Abhishek Ray
> Assignee: Abhishek Ray
> Priority: Major
> Fix For: 2.0.0
>
> Attachments: Screen Shot 2019-05-13 at 2.41.54 PM.png, Screen Shot
> 2019-05-13 at 6.45.25 PM.png
>
>
> If multiple DAGs are defined in the same file and they share the same
> default_args, then the subsequent DAGs have an incorrect timezone.
>
> Steps to reproduce:
>
> Set the default_timezone to be non-UTC in airflow.cfg
>
> {noformat}
> default_timezone = America/New_York{noformat}
>
> DAG definition which has multiple DAGs in the same file:
>
>
> {code:java}
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from datetime import datetime, timedelta
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2019, 5, 11),
> }
> def make_dynamic_dag(schedule_interval, dag_name):
> dag = DAG(f"tutorial_{dag_name}", default_args=default_args,
> schedule_interval=schedule_interval)
> t1 = BashOperator(task_id='print_date', bash_command='date', dag=dag)
> return dag
> test_dag_1 = make_dynamic_dag("00 15 * * *", “1”)
> test_dag_2 = make_dynamic_dag("00 18 * * *", “2”)
> {code}
>
>
> test_dag_1 is expected to run at 15:00 EST or 19:00 UTC and test_dag_2 is
> expected to run at 18:00 EST or 22:00 UTC.
>
> However, test_dag_2 runs at 18:00 UTC which seems to point at it losing
> timezone information:
> !Screen Shot 2019-05-13 at 2.41.54 PM.png!
>
> I added some logging in the Airflow code around the default_args
> initialization and it confirmed the hypothesis that the default_args were
> being mutated:
>
> {noformat}
> [2019-05-13 18:40:10,409] {__init__.py:3045} INFO - default_args for DAG
> tutorial_1: {'owner': 'airflow', 'start_date': datetime.datetime(2019, 5, 11,
> 0, 0)}
> [2019-05-13 18:40:10,410] {__init__.py:3045} INFO - default_args for DAG
> tutorial_2: {'owner': 'airflow', 'start_date': <Pendulum
> [2019-05-11T04:00:00+00:00]>}
> {noformat}
>
>
> As a simple fix, I changed the DAG definition to:
> {noformat}
> dag = DAG(f"tutorial_{dag_name}", default_args=default_args,
> schedule_interval=schedule_interval){noformat}
> and this seems to fix the problem:
>
> {noformat}
> [2019-05-13 18:44:44,674] {__init__.py:3045} INFO - default_args for DAG
> tutorial_1: {'owner': 'airflow', 'start_date': datetime.datetime(2019, 5, 11,
> 0, 0)}
> [2019-05-13 18:44:44,676] {__init__.py:3045} INFO - default_args for DAG
> tutorial_2: {'owner': 'airflow', 'start_date': datetime.datetime(2019, 5, 11,
> 0, 0)}
> {noformat}
>
> !Screen Shot 2019-05-13 at 6.45.25 PM.png!
> I want to add a fix to create a deep-copy of default_args here:
> [https://github.com/apache/airflow/blob/master/airflow/models/dag.py#L197]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)