gor-obr opened a new issue #7999: Incorrect DAG scheduling after DST URL: https://github.com/apache/airflow/issues/7999 **Apache Airflow version**: 1.10.9, 2.0.0dev **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): **Environment**: - **Cloud provider or hardware configuration**: - **OS** (e.g. from /etc/os-release): - **Kernel** (e.g. `uname -a`): - **Install tools**: - **Others**: **What happened**: If DAG's cron contains "non trivial" hours section, Scheduler will not schedule DAG correctly immediately after DST switch. In this case, "non-trivial" means either having an interval (`0 7-8 * * *`) or multiple values (`0 7,9 * * *`). If DST occurs in morning (2-3 AM of March 8th), following following executions will occur: ``` 0 7-8 * * * "2020-03-07T07:00:00-08:00" "2020-03-07T08:00:00-08:00" # DST switch after this run "2020-03-08T08:00:00-07:00" # 8 AM instead of 7 AM 0 7,9 * * * "2020-03-07T07:00:00-08:00" "2020-03-07T09:00:00-08:00" "2020-03-08T07:00:00-07:00" # DST switch after this run "2020-03-08T08:00:00-07:00" # 8 AM instead of 7 AM ``` Cause for this is the method `is_fixed_time_schedule` in `dag.py`, which tests whether cron is "fixed" (i.e. "execute exactly at this time") or "relative" (i.e. "execute on each n hours"). Method relies on a quite crude test, it calculates two subsequent times and checks whether both hours and minutes of them are the same: ```python now = datetime.now() cron = croniter(self._schedule_interval, now) start = cron.get_next(datetime) cron_next = cron.get_next(datetime) if cron_next.minute == start.minute and cron_next.hour == start.hour: return True return False ``` This is not satisfied in case of above examples (it is executed at two different hours during each day, so hours in subsequent executions are never the same). Based on this, method `following_schedule` in `dag.py` thinks that this DAG should be executed "on each n hours", and explicitly works around DST. It calculates the amount of time which needs to pass until next run, and adds that amount of time to the previous run, thus ignoring DST. ```python # We assume that DST transitions happen on the minute/hour if not self.is_fixed_time_schedule(): # relative offset (eg. every 5 minutes) delta = cron.get_next(datetime) - naive following = dttm.in_timezone(self.timezone).add_timedelta(delta) else: # absolute (e.g. 3 AM) naive = cron.get_next(datetime) tz = pendulum.timezone(self.timezone.name) following = timezone.make_aware(naive, tz) ``` Note that only the first execution (or first few executions) will be affected. After them, the calculation will stabilize and will work correctly going forward. This is describes the same issue as https://issues.apache.org/jira/browse/AIRFLOW-7039 **What you expected to happen**: <!-- What do you think went wrong? --> **How to reproduce it**: **Anything else we need to know**:
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services