[ https://issues.apache.org/jira/browse/AIRFLOW-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jarek Potiuk reassigned AIRFLOW-7039: ------------------------------------- Assignee: (was: Jarek Potiuk) > Specific DAG Schedule & DST Results in Skipped DAG Run > ------------------------------------------------------ > > Key: AIRFLOW-7039 > URL: https://issues.apache.org/jira/browse/AIRFLOW-7039 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: 1.10.7 > Environment: Amazon Linux 2 AMI > Reporter: Peter Kim > Priority: Critical > Labels: timezone > > *Scenario:* > EC2 running airflow is in Eastern Time (America/New_York), > airflow.cfg>[core]>default_timezone=America/New_York (automatically changes > correctly) > Monday morning after Daylight Savings Time applied a handful of DAG runs were > not executed as expected. The strange part is that these DAGs were the only > jobs that did not behave as expected, all other DAGs ran normally. > Additionally, only the first expected run after DST was skipped, subsequent > runs later that day were scheduled successfully. > Here is the pattern observed: > DAG Schedule which skipped first run: (0 <hour_a>,<hour_b> * * 1,2,3,4,5) > e.g. Schedules M-F, with two distinct runs per day. > DAGs that run at one time, M-F & DAGs that run at two times, not M-F did not > experience this issue. > > Based on the logs, it appears as if the expected run that was missed was not > seen by the scheduler whatsoever (see below): > > > 2020 03 06 6:30 AM ET (BEFORE DST, EXPECTED BEHAVIOR): > [2020-03-06 06:31:01,220] \{logging_mixin.py:112} INFO - [2020-03-06 > 06:31:01,220] \{settings.py:254} INFO - settings.configure_orm(): Using pool > settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=697 > [2020-03-06 06:31:01,222] \{scheduler_job.py:153} INFO - Started process > (PID=697) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-06 06:31:01,228] \{scheduler_job.py:1539} INFO - Processing file > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue > [2020-03-06 06:31:01,228] \{logging_mixin.py:112} INFO - [2020-03-06 > 06:31:01,228] \{dagbag.py:403} INFO - Filling up the DagBag from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-06 06:31:01,238] \{scheduler_job.py:1551} INFO - DAG(s) > dict_keys(['<DAG_NAME>']) retrieved from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-06 06:31:01,305] \{scheduler_job.py:1262} INFO - Processing > <DAG_NAME> > [2020-03-06 06:31:01,348] \{logging_mixin.py:112} WARNING - > /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: > Warning: (1300, "Invalid utf8mb4 character string: '80047D'") result = > self._query(query) > [2020-03-06 06:31:01,362] \{scheduler_job.py:1272} INFO - Created <DagRun > <DAG_NAME> @ 2020-03-05T15:30:00+00:00: scheduled__2020-03-05T15:30:00+00:00, > externally triggered: False> > [2020-03-06 06:31:01,366] \{scheduler_job.py:740} INFO - Examining DAG run > <DagRun <DAG_NAME> @ 2020-03-05 15:30:00+00:00: > scheduled__2020-03-05T15:30:00+00:00, externally triggered: False> > [2020-03-06 06:31:01,389] \{scheduler_job.py:440} INFO - Skipping SLA check > for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs > [2020-03-06 06:31:01,395] \{scheduler_job.py:1613} INFO - Creating / updating > <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-05 15:30:00+00:00 [scheduled]> > in ORM > [2020-03-06 06:31:01,414] \{scheduler_job.py:161} INFO - Processing > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.192 seconds > 20200306 10 AM ET (BEFORE DST, EXPECTED BEHAVIOR): > [2020-03-06 10:30:00,083] \{logging_mixin.py:112} INFO - [2020-03-06 > 10:30:00,082] \{settings.py:254} INFO - settings.configure_orm(): Using pool > settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=16194 > [2020-03-06 10:30:00,085] \{scheduler_job.py:153} INFO - Started process > (PID=16194) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-06 10:30:00,090] \{scheduler_job.py:1539} INFO - Processing file > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue > [2020-03-06 10:30:00,090] \{logging_mixin.py:112} INFO - [2020-03-06 > 10:30:00,090] \{dagbag.py:403} INFO - Filling up the DagBag from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-06 10:30:00,099] \{scheduler_job.py:1551} INFO - DAG(s) > dict_keys(['<DAG_NAME>']) retrieved from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-06 10:30:00,159] \{scheduler_job.py:1262} INFO - Processing > <DAG_NAME> > [2020-03-06 10:30:00,193] \{logging_mixin.py:112} WARNING - > /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: > Warning: (1300, "Invalid utf8mb4 character string: '80047D'") > result = self._query(query) > [2020-03-06 10:30:00,207] \{scheduler_job.py:1272} INFO - Created <DagRun > <DAG_NAME> @ 2020-03-06T11:30:00+00:00: scheduled__2020-03-06T11:30:00+00:00, > externally triggered: False> > [2020-03-06 10:30:00,212] \{scheduler_job.py:740} INFO - Examining DAG run > <DagRun <DAG_NAME> @ 2020-03-06 11:30:00+00:00: > scheduled__2020-03-06T11:30:00+00:00, externally triggered: False> > [2020-03-06 10:30:00,232] \{scheduler_job.py:440} INFO - Skipping SLA check > for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs > [2020-03-06 10:30:00,236] \{scheduler_job.py:1613} INFO - Creating / updating > <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-06 11:30:00+00:00 [scheduled]> > in ORM > [2020-03-06 10:30:00,251] \{scheduler_job.py:161} INFO - Processing > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.166 seconds > > 2020 03 09 6:30 ET (AFTER DST, *UNEXPECTED BEHAVIOR*): > [2020-03-09 06:31:55,240] \{logging_mixin.py:112} INFO - [2020-03-09 > 06:31:55,240] \{settings.py:254} INFO - settings.configure_orm(): Using pool > settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=23336 > [2020-03-09 06:31:55,242] \{scheduler_job.py:153} INFO - Started process > (PID=23336) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-09 06:31:55,247] \{scheduler_job.py:1539} INFO - Processing file > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue > [2020-03-09 06:31:55,247] \{logging_mixin.py:112} INFO - [2020-03-09 > 06:31:55,247] \{dagbag.py:403} INFO - Filling up the DagBag from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-09 06:31:55,257] \{scheduler_job.py:1551} INFO - DAG(s) > dict_keys(['<DAG_NAME>']) retrieved from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-09 06:31:55,312] \{scheduler_job.py:1262} INFO - Processing > <DAG_NAME> > [2020-03-09 06:31:55,338] \{scheduler_job.py:440} INFO - Skipping SLA check > for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs > [2020-03-09 06:31:55,343] \{scheduler_job.py:161} INFO - Processing > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.101 seconds > 20200309 10 AM ET (AFTER DST, EXPECTED BEHAVIOR): > [2020-03-09 10:30:54,487] \{logging_mixin.py:112} INFO - [2020-03-09 > 10:30:54,487] \{settings.py:254} INFO - settings.configure_orm(): Using pool > settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=6421 > [2020-03-09 10:30:54,490] \{scheduler_job.py:153} INFO - Started process > (PID=6421) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-09 10:30:54,498] \{scheduler_job.py:1539} INFO - Processing file > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue > [2020-03-09 10:30:54,499] \{logging_mixin.py:112} INFO - [2020-03-09 > 10:30:54,499] \{dagbag.py:403} INFO - Filling up the DagBag from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-09 10:30:54,508] \{scheduler_job.py:1551} INFO - DAG(s) > dict_keys(['<DAG_NAME>']) retrieved from > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py > [2020-03-09 10:30:54,585] \{scheduler_job.py:1262} INFO - Processing > <DAG_NAME> > [2020-03-09 10:30:54,621] \{logging_mixin.py:112} WARNING - > /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: > Warning: (1300, "Invalid utf8mb4 character string: '80047D'") > result = self._query(query) > [2020-03-09 10:30:54,634] \{scheduler_job.py:1272} INFO - Created <DagRun > <DAG_NAME> @ 2020-03-09T10:30:00+00:00: scheduled__2020-03-09T10:30:00+00:00, > externally triggered: False> > [2020-03-09 10:30:54,638] \{scheduler_job.py:740} INFO - Examining DAG run > <DagRun <DAG_NAME> @ 2020-03-09 10:30:00+00:00: > scheduled__2020-03-09T10:30:00+00:00, externally triggered: False> > [2020-03-09 10:30:54,660] \{scheduler_job.py:440} INFO - Skipping SLA check > for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs > [2020-03-09 10:30:54,665] \{scheduler_job.py:1613} INFO - Creating / updating > <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-09 10:30:00+00:00 [scheduled]> > in ORM > [2020-03-09 10:30:54,683] \{scheduler_job.py:161} INFO - Processing > /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.193 seconds -- This message was sent by Atlassian Jira (v8.3.4#803005)