shohamy7 opened a new issue #21089:
URL: https://github.com/apache/airflow/issues/21089


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   I have a few DAGs in my dag folder. I used git sync in order copy them into 
the dag folder.
   I saw the DAGs inside my dag folder, and I saw the last time they have been 
changed what Jan 24 (I used the `ls -l /opt/airflow/dags/repo/` command in 
order to check that)
   Example for one DAG that I have in my dag folder:
   `-rw-r--r-- 1 65533 root 4141 **Jan 24 19:30** clear_missing_dags.py`
   When I opened the logs of the scheduler inside the path 
`/opt/airflow/logs/scheduler/latest/{my_dag_file}.log` and the logs inside the 
`/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log` I saw that 
the scheduler load the DAGs in Jan 25 even though they did not change.
   Example for logs from the scheduler logs:
   [2022-01-25 09:46:18,615] {processor.py:654} INFO - DAG(s) 
dict_keys(['clear_missing_dags']) retrieved from 
/opt/airflow/dags/repo/clear_missing_dags.py
   [2022-01-25 09:46:18,633] {logging_mixin.py:109} INFO - [2022-01-25 
09:46:18,633] {dag.py:2396} INFO - Sync 1 DAGs
   [2022-01-25 09:46:18,655] {logging_mixin.py:109} INFO - [2022-01-25 
09:46:18,655] {dag.py:2935} INFO - Setting next_dagrun for clear_missing_dags 
to None
   [**2022-01-25 09:46:18,676**] {processor.py:171} INFO - Processing 
/opt/airflow/dags/repo/clear_missing_dags.py took 0.186 seconds
   Example for logs from the dag processor manager:
   DAG File Processing Stats
   
   File Path                                     PID    Runtime      # DAGs    
# Errors  Last Runtime    Last Run
   --------------------------------------------  -----  ---------  --------  
----------  --------------  -------------------
   /opt/airflow/dags/repo/bash_example.py                                 0     
      1  0.15s           2022-01-25T09:48:20
   /opt/airflow/dags/repo/branch_datetime.py                              0     
      1  0.15s           2022-01-25T09:48:26
   /opt/airflow/dags/repo/python_example.py                               1     
      0  0.20s           2022-01-25T09:48:33
   /opt/airflow/dags/repo/clear_missing_dags.py                           1     
      0  0.17s           2022-01-25T09:48:20
   
================================================================================
   [**2022-01-25 09:48:48,730**] {manager.py:1065} INFO - Finding 'running' 
jobs without a recent heartbeat
   [2022-01-25 09:48:48,731] {manager.py:1069} INFO - Failing jobs without 
heartbeat after 2022-01-25 09:43:48.731074+00:00
   As far as I know, the scheduler checks if the dag has been change (by 
checking if the date of the file has been change from the last time we loaded 
the dag)
   I seems like this is not working.
   
   ### What you expected to happen
   
   I expected that the scheduler will not try to load the DAG again until we'll 
change it.
   
   ### How to reproduce
   
   This happens on the default helm chart deployment (I used `helm install 
airflow .`).
   You can reproduce it by deploying the chart and creating a dag file inside 
the dag folder.
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Used the default values from the helm chart and only configured the git-sync 
option
   
   ### Anything else
   
   This problem happens each time we try to load DAGs. This cause the scheduler 
to run the cluster policies every X seconds instead of running it only when the 
DAG has changed
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to