[
https://issues.apache.org/jira/browse/AIRFLOW-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247481#comment-17247481
]
Peter Kim commented on AIRFLOW-7039:
------------------------------------
Hello, I am wondering if this fix can be incorporated into the Airflow 2.0
Release? Is this still on anyone's radar?
Thanks, Peter.
> Specific DAG Schedule & DST Results in Skipped DAG Run
> ------------------------------------------------------
>
> Key: AIRFLOW-7039
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7039
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.10.7
> Environment: Amazon Linux 2 AMI
> Reporter: Peter Kim
> Priority: Critical
> Labels: timezone
>
> *Scenario:*
> EC2 running airflow is in Eastern Time (America/New_York),
> airflow.cfg>[core]>default_timezone=America/New_York (automatically changes
> correctly)
> Monday morning after Daylight Savings Time applied a handful of DAG runs were
> not executed as expected. The strange part is that these DAGs were the only
> jobs that did not behave as expected, all other DAGs ran normally.
> Additionally, only the first expected run after DST was skipped, subsequent
> runs later that day were scheduled successfully.
> Here is the pattern observed:
> DAG Schedule which skipped first run: (0 <hour_a>,<hour_b> * * 1,2,3,4,5)
> e.g. Schedules M-F, with two distinct runs per day.
> DAGs that run at one time, M-F & DAGs that run at two times, not M-F did not
> experience this issue.
>
> Based on the logs, it appears as if the expected run that was missed was not
> seen by the scheduler whatsoever (see below):
>
>
> 2020 03 06 6:30 AM ET (BEFORE DST, EXPECTED BEHAVIOR):
> [2020-03-06 06:31:01,220] \{logging_mixin.py:112} INFO - [2020-03-06
> 06:31:01,220] \{settings.py:254} INFO - settings.configure_orm(): Using pool
> settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=697
> [2020-03-06 06:31:01,222] \{scheduler_job.py:153} INFO - Started process
> (PID=697) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 06:31:01,228] \{scheduler_job.py:1539} INFO - Processing file
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-06 06:31:01,228] \{logging_mixin.py:112} INFO - [2020-03-06
> 06:31:01,228] \{dagbag.py:403} INFO - Filling up the DagBag from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 06:31:01,238] \{scheduler_job.py:1551} INFO - DAG(s)
> dict_keys(['<DAG_NAME>']) retrieved from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 06:31:01,305] \{scheduler_job.py:1262} INFO - Processing
> <DAG_NAME>
> [2020-03-06 06:31:01,348] \{logging_mixin.py:112} WARNING -
> /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170:
> Warning: (1300, "Invalid utf8mb4 character string: '80047D'") result =
> self._query(query)
> [2020-03-06 06:31:01,362] \{scheduler_job.py:1272} INFO - Created <DagRun
> <DAG_NAME> @ 2020-03-05T15:30:00+00:00: scheduled__2020-03-05T15:30:00+00:00,
> externally triggered: False>
> [2020-03-06 06:31:01,366] \{scheduler_job.py:740} INFO - Examining DAG run
> <DagRun <DAG_NAME> @ 2020-03-05 15:30:00+00:00:
> scheduled__2020-03-05T15:30:00+00:00, externally triggered: False>
> [2020-03-06 06:31:01,389] \{scheduler_job.py:440} INFO - Skipping SLA check
> for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-06 06:31:01,395] \{scheduler_job.py:1613} INFO - Creating / updating
> <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-05 15:30:00+00:00 [scheduled]>
> in ORM
> [2020-03-06 06:31:01,414] \{scheduler_job.py:161} INFO - Processing
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.192 seconds
> 20200306 10 AM ET (BEFORE DST, EXPECTED BEHAVIOR):
> [2020-03-06 10:30:00,083] \{logging_mixin.py:112} INFO - [2020-03-06
> 10:30:00,082] \{settings.py:254} INFO - settings.configure_orm(): Using pool
> settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=16194
> [2020-03-06 10:30:00,085] \{scheduler_job.py:153} INFO - Started process
> (PID=16194) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 10:30:00,090] \{scheduler_job.py:1539} INFO - Processing file
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-06 10:30:00,090] \{logging_mixin.py:112} INFO - [2020-03-06
> 10:30:00,090] \{dagbag.py:403} INFO - Filling up the DagBag from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 10:30:00,099] \{scheduler_job.py:1551} INFO - DAG(s)
> dict_keys(['<DAG_NAME>']) retrieved from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 10:30:00,159] \{scheduler_job.py:1262} INFO - Processing
> <DAG_NAME>
> [2020-03-06 10:30:00,193] \{logging_mixin.py:112} WARNING -
> /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170:
> Warning: (1300, "Invalid utf8mb4 character string: '80047D'")
> result = self._query(query)
> [2020-03-06 10:30:00,207] \{scheduler_job.py:1272} INFO - Created <DagRun
> <DAG_NAME> @ 2020-03-06T11:30:00+00:00: scheduled__2020-03-06T11:30:00+00:00,
> externally triggered: False>
> [2020-03-06 10:30:00,212] \{scheduler_job.py:740} INFO - Examining DAG run
> <DagRun <DAG_NAME> @ 2020-03-06 11:30:00+00:00:
> scheduled__2020-03-06T11:30:00+00:00, externally triggered: False>
> [2020-03-06 10:30:00,232] \{scheduler_job.py:440} INFO - Skipping SLA check
> for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-06 10:30:00,236] \{scheduler_job.py:1613} INFO - Creating / updating
> <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-06 11:30:00+00:00 [scheduled]>
> in ORM
> [2020-03-06 10:30:00,251] \{scheduler_job.py:161} INFO - Processing
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.166 seconds
>
> 2020 03 09 6:30 ET (AFTER DST, *UNEXPECTED BEHAVIOR*):
> [2020-03-09 06:31:55,240] \{logging_mixin.py:112} INFO - [2020-03-09
> 06:31:55,240] \{settings.py:254} INFO - settings.configure_orm(): Using pool
> settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=23336
> [2020-03-09 06:31:55,242] \{scheduler_job.py:153} INFO - Started process
> (PID=23336) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 06:31:55,247] \{scheduler_job.py:1539} INFO - Processing file
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-09 06:31:55,247] \{logging_mixin.py:112} INFO - [2020-03-09
> 06:31:55,247] \{dagbag.py:403} INFO - Filling up the DagBag from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 06:31:55,257] \{scheduler_job.py:1551} INFO - DAG(s)
> dict_keys(['<DAG_NAME>']) retrieved from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 06:31:55,312] \{scheduler_job.py:1262} INFO - Processing
> <DAG_NAME>
> [2020-03-09 06:31:55,338] \{scheduler_job.py:440} INFO - Skipping SLA check
> for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-09 06:31:55,343] \{scheduler_job.py:161} INFO - Processing
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.101 seconds
> 20200309 10 AM ET (AFTER DST, EXPECTED BEHAVIOR):
> [2020-03-09 10:30:54,487] \{logging_mixin.py:112} INFO - [2020-03-09
> 10:30:54,487] \{settings.py:254} INFO - settings.configure_orm(): Using pool
> settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=6421
> [2020-03-09 10:30:54,490] \{scheduler_job.py:153} INFO - Started process
> (PID=6421) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 10:30:54,498] \{scheduler_job.py:1539} INFO - Processing file
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-09 10:30:54,499] \{logging_mixin.py:112} INFO - [2020-03-09
> 10:30:54,499] \{dagbag.py:403} INFO - Filling up the DagBag from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 10:30:54,508] \{scheduler_job.py:1551} INFO - DAG(s)
> dict_keys(['<DAG_NAME>']) retrieved from
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 10:30:54,585] \{scheduler_job.py:1262} INFO - Processing
> <DAG_NAME>
> [2020-03-09 10:30:54,621] \{logging_mixin.py:112} WARNING -
> /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170:
> Warning: (1300, "Invalid utf8mb4 character string: '80047D'")
> result = self._query(query)
> [2020-03-09 10:30:54,634] \{scheduler_job.py:1272} INFO - Created <DagRun
> <DAG_NAME> @ 2020-03-09T10:30:00+00:00: scheduled__2020-03-09T10:30:00+00:00,
> externally triggered: False>
> [2020-03-09 10:30:54,638] \{scheduler_job.py:740} INFO - Examining DAG run
> <DagRun <DAG_NAME> @ 2020-03-09 10:30:00+00:00:
> scheduled__2020-03-09T10:30:00+00:00, externally triggered: False>
> [2020-03-09 10:30:54,660] \{scheduler_job.py:440} INFO - Skipping SLA check
> for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-09 10:30:54,665] \{scheduler_job.py:1613} INFO - Creating / updating
> <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-09 10:30:00+00:00 [scheduled]>
> in ORM
> [2020-03-09 10:30:54,683] \{scheduler_job.py:161} INFO - Processing
> /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.193 seconds
--
This message was sent by Atlassian Jira
(v8.3.4#803005)