mattinbits opened a new issue #21171:
URL: https://github.com/apache/airflow/issues/21171


   ### Apache Airflow version
   
   2.2.2
   
   ### What happened
   
   Airflow schedules a task an hour earlier than expected, when using an MSSQL 
metadata database where the DB server is set to the CET timezone. The 
screenshot below shows the DAG starting an hour before the end of the data 
interval.
   
   <img width="220" alt="image" 
src="https://user-images.githubusercontent.com/3765307/151439921-43b2ac5b-3e93-4337-a311-82e694ad3084.png";>
   
   
   ### What you expected to happen
   
   Airflow schedules the task at the correct time in UTC.
   
   ### How to reproduce
   
   It's hard to describe a complete reproducible method since it relies on 
having an MSSQL Server with particular settings.
   
   A relevant DAG would be a simple as:
   
   ```
   with DAG(
       dag_id="example_dag",
       start_date=datetime(2021, 1, 1),
       schedule_interval="0 9 * * 1-5",
   ) as dag:
       task = DummyOperator(task_id="dummy")
   ```
   
   And Airflow config of:
   
   ```
   default_timezone = utc
   ```
   
   This DAG would then be scheduled an hour earlier than expected.
   
   ### Operating System
   
   Redhat UBI 8
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   Airflow scheduler and webserver each running in a docker container based on 
Redhat UBI 8. Metadata DB is MSSQL Server running on a Windows Server where the 
server timezone is CET.
   
   ### Anything else
   
   In our installation, the problem is happening for any DAG with a UTC based 
schedule.
   
   I believe the root cause is this line of code:
   
https://github.com/apache/airflow/blob/6405d8f804e7cbd1748aa7eed65f2bbf0fcf022e/airflow/models/dag.py#L2872
   
   On MSSQL, `func.now()` appears to correspond to `GETDATE()`, which returns 
the current time in the timezone of the DB server. But 
`next_dagrun_create_after` is stored in the database as UTC (in a `datetime2` 
column, which doesn't include timezone information). So this line of code is 
equivalent to "Is the current time in CET before the next creation time in 
UTC?", meaning that a DAG that should start at 09:00 UTC starts at 09:00 CET 
instead, one hour early.
   
   I can verify that `func.now()` returns CET with the SQLAlchemy code 
`engine.execute(sa.select([sa.func.now()])).fetchall()`.
   
   I think the correct way to get the current time in UTC on MSSQL is 
`GETUTCDATE()`.
   
   We ran Airflow 1.10 previously without seeing this problem. From what I can 
tell, in that version the date comparison is done on the application side 
rather than in the DB.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to