marcosmartinezfco opened a new issue, #60223:
URL: https://github.com/apache/airflow/issues/60223

   ### Apache Airflow version
   
   3.1.5
   
   ### If "Other Airflow 3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   We are using DAG bundles stored in S3. The Airflow “control plane” 
(scheduler / DAG processor) downloads bundles to a local folder under 
`/tmp/airflow/<bundle-name>/` for parsing and UI display. Celery workers also 
download the bundle so they can execute tasks.
   
   We are seeing intermittent/sticky behavior where **updated DAG files are 
successfully uploaded to S3**, but **Airflow does not download the new 
version**. Instead, Airflow logs:
   
    `Local file ... is up-to-date with S3 object ... Skipping download.`
   
   Even after waiting several minutes and multiple DAG processor loops, the 
local files under `/tmp/airflow/...` do not change. If we **manually delete the 
local bundle directory**, the next loop re-downloads the bundle and picks up 
changes.
   
   This can impact:
   - Control plane: UI shows stale DAG code until the cache is manually deleted 
/ container restarted.
   - Workers: tasks may execute with stale DAG code (we expected workers to 
re-download on each run in our setup, but they can also appear stale).
   
   
   ### What you think should happen instead?
   
   
   When the object in S3 changes (new upload), the next bundle sync should 
download the updated object and refresh the local bundle directory without 
requiring manual deletion of local files.
   
   
   ### How to reproduce
   
   
   We were able to reproduce this more consistently when the DAG change is 
**only within templated fields**, specifically the `bash_command` argument of 
`BashOperator` (i.e. a change inside the string that gets templated at runtime).
   
   Empirically:
   
   - If we make a change that is **only inside** 
`BashOperator(bash_command=...)`, the S3 bundle sync sometimes logs that the 
local file is “up-to-date” and **does not re-download** the updated DAG file 
(stale `/tmp/airflow/...`).
   - If we make a change **outside** of the templated `bash_command` string 
(e.g., a comment, a constant, changing a non-templated field), the change is 
much more likely to be detected and the updated file gets downloaded.
   
   This makes the issue appear correlated with updates that only affect 
templated sections of the DAG file (though we have not proven causation).
   
   ```python
   from datetime import datetime, timedelta
   
   from airflow.providers.standard.operators.bash import BashOperator
   from airflow.sdk import DAG
   
   default_args = {
       "owner": "owner",
       "retries": 1,
       "retry_delay": timedelta(minutes=1),
       "execution_timeout": timedelta(minutes=5),
       "start_date": datetime(2026, 1, 1),
       "queue": "queue",
   }
   
   # dummy comment
   with DAG(
       dag_id="dummy",
       default_args=default_args,
       schedule="0 0 * * *",
       catchup=False,
       tags=["test"],
   ):
       hello_world = BashOperator(
           task_id="print_hello_world",
           bash_command="echo 'Hello World from dummy DAG bundle test! ;)'",
   ```
   
   
   ### Operating System
   
   AlmaLinux 9.5 (Teal Serval)
   
   ### Versions of Apache Airflow Providers
   
   airflow-exporter                          1.6.0
   apache-airflow                            3.1.5
   apache-airflow-core                       3.1.5
   apache-airflow-providers-amazon           9.18.0
   apache-airflow-providers-celery           3.14.0
   apache-airflow-providers-cncf-kubernetes  10.11.0
   apache-airflow-providers-common-compat    1.10.0
   apache-airflow-providers-common-io        1.7.0
   apache-airflow-providers-common-messaging 2.0.1
   apache-airflow-providers-common-sql       1.30.0
   apache-airflow-providers-docker           4.5.0
   apache-airflow-providers-elasticsearch    6.4.0
   apache-airflow-providers-fab              3.0.3
   apache-airflow-providers-ftp              3.14.0
   apache-airflow-providers-git              0.1.0
   apache-airflow-providers-google           19.1.0
   apache-airflow-providers-grpc             3.9.0
   apache-airflow-providers-hashicorp        4.4.0
   apache-airflow-providers-http             5.6.0
   apache-airflow-providers-microsoft-azure  12.9.0
   apache-airflow-providers-mysql            6.4.0
   apache-airflow-providers-odbc             4.11.0
   apache-airflow-providers-openlineage      2.9.0
   apache-airflow-providers-postgres         6.5.0
   apache-airflow-providers-redis            4.4.0
   apache-airflow-providers-sendgrid         4.2.0
   apache-airflow-providers-sftp             5.5.0
   apache-airflow-providers-slack            9.6.0
   apache-airflow-providers-smtp             2.4.0
   apache-airflow-providers-snowflake        6.7.0
   apache-airflow-providers-ssh              3.14.0
   apache-airflow-providers-standard         1.10.0
   apache-airflow-task-sdk                   1.1.5
   google-cloud-orchestration-airflow        1.18.0
   
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   We have one control plane running the airflow components and N celery queues 
with one celery worker per queue.
   
   Metadata db in rds and redis running in control plane for the celery queue
   
   ### Anything else?
   
   
   We have multiple S3 bundles (different prefixes / bundle names). The issue 
reproduces for some bundles more often than others, but we were eventually able 
to reproduce it for multiple bundles.
   
   In logs, when the change is detected correctly, we see something like:
   
   ```text
   S3 object size (20372) and local file size (20371) differ. Downloaded 
<dag>.py to /tmp/airflow/<bundle>/<dag>.py
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to