Lee-W commented on code in PR #64100:
URL: https://github.com/apache/airflow/pull/64100#discussion_r2979431833
##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from
the UI. Common causes in
* **Time synchronization issues** - Ensure all nodes (database, schedulers,
workers) use NTP with <1s clock drift.
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records,
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** —
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors.
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+
+ # BAD: datetime.now() produces a different value on every parse
+ with DAG(
+ dag_id="bad_example",
+ start_date=datetime.now(),
+ schedule="@daily",
+ ):
+ ...
Review Comment:
```suggestion
with DAG(
dag_id="bad_example",
# BAD: datetime.now() produces a different value on every parse
start_date=datetime.now(),
schedule="@daily",
):
...
```
##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from
the UI. Common causes in
* **Time synchronization issues** - Ensure all nodes (database, schedulers,
workers) use NTP with <1s clock drift.
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records,
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** —
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors.
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+
+ # BAD: datetime.now() produces a different value on every parse
+ with DAG(
+ dag_id="bad_example",
+ start_date=datetime.now(),
+ schedule="@daily",
+ ):
+ ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+ import random
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.python import PythonOperator
+
+ with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily")
as dag:
+ # BAD: random value changes every parse
+ PythonOperator(
+ task_id=f"task_{random.randint(1, 1000)}",
+ python_callable=lambda: None,
+ )
Review Comment:
```suggestion
with DAG(dag_id="bad_random", start_date="2024-01-01",
schedule="@daily") as dag:
PythonOperator(
# BAD: random value changes every parse
task_id=f"task_{random.randint(1, 1000)}",
python_callable=lambda: None,
)
```
##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from
the UI. Common causes in
* **Time synchronization issues** - Ensure all nodes (database, schedulers,
workers) use NTP with <1s clock drift.
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records,
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** —
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors.
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+
+ # BAD: datetime.now() produces a different value on every parse
+ with DAG(
+ dag_id="bad_example",
+ start_date=datetime.now(),
+ schedule="@daily",
+ ):
+ ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+ import random
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.python import PythonOperator
+
+ with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily")
as dag:
+ # BAD: random value changes every parse
+ PythonOperator(
+ task_id=f"task_{random.randint(1, 1000)}",
+ python_callable=lambda: None,
+ )
+
+**3. Assigning runtime-varying values to variables used in constructors:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.python import PythonOperator
+
+ # BAD: the variable captures a parse-time value, then is passed to the DAG
+ default_args = {"start_date": datetime.now()}
+
+ with DAG(dag_id="bad_defaults", default_args=default_args,
schedule="@daily") as dag:
+ PythonOperator(task_id="my_task", python_callable=lambda: None)
+
+Even though ``datetime.now()`` is not called directly inside the Dag
constructor, it flows in through
+``default_args`` and still causes a different serialized Dag on every parse.
+
+**4. Using environment variables or file contents that change between parses:**
+
+.. code-block:: python
+
+ import os
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.bash import BashOperator
+
+ with DAG(dag_id="bad_env", start_date="2024-01-01", schedule="@daily") as
dag:
+ # BAD if BUILD_NUMBER changes on every deployment or parse
+ BashOperator(
+ task_id="echo_build",
+ bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
+ )
+
+How to avoid version inflation
+""""""""""""""""""""""""""""""
+
+* **Use fixed ``start_date`` values.** Always set ``start_date`` to a static
``datetime`` literal:
+
+ .. code-block:: python
+
+ import datetime
+
+ from airflow.sdk import DAG
+
+ with DAG(
+ dag_id="good_example",
+ start_date=datetime.datetime(2024, 1, 1),
+ schedule="@daily",
+ ):
+ ...
+
+* **Keep all Dag and Task constructor arguments deterministic.** Arguments
passed to Dag and Operator
+ constructors must produce the same value on every parse. Move any dynamic
computation into the
+ ``execute()`` method or use Jinja templates, which are evaluated at task
execution time rather than
+ parse time.
+
+* **Use Jinja templates for dynamic values:**
+
+ .. code-block:: python
+
+ from airflow.providers.standard.operators.bash import BashOperator
+
+ # GOOD: the template is resolved at execution time, not parse time
+ BashOperator(
+ task_id="echo_date",
+ bash_command="echo {{ ds }}",
+ )
+
+* **Use Airflow Variables with templates instead of top-level lookups:**
+
+ .. code-block:: python
+
+ from airflow.providers.standard.operators.bash import BashOperator
+
+ # GOOD: Variable is resolved at execution time via template
+ BashOperator(
+ task_id="echo_var",
+ bash_command="echo {{ var.value.my_variable }}",
+ )
+
+Dag version inflation detection
+""""""""""""""""""""""""""""""""
+
+Starting from Airflow 3.2, the Dag processor performs **AST-based static
analysis** on every Dag file
+before parsing to detect runtime-varying values in Dag and Task constructors.
When a potential issue is
+found, it is surfaced as a **Dag warning** visible in the UI.
+
+You can control this behavior with the
+:ref:`dag_version_inflation_check_level
<config:dag_processor__dag_version_inflation_check_level>`
+configuration option:
+
+* ``off`` — Disables the check entirely. No errors or warnings are generated.
+* ``warning`` (default) — Dags load normally but warnings are displayed in the
UI when issues are detected.
+* ``error`` — Treats detected issues as Dag import errors, preventing the Dag
from loading.
Review Comment:
Probably could mention the ruff rule
https://docs.astral.sh/ruff/rules/airflow3-dag-dynamic-value/ here?
##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from
the UI. Common causes in
* **Time synchronization issues** - Ensure all nodes (database, schedulers,
workers) use NTP with <1s clock drift.
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records,
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** —
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors.
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+
+ # BAD: datetime.now() produces a different value on every parse
+ with DAG(
+ dag_id="bad_example",
+ start_date=datetime.now(),
+ schedule="@daily",
+ ):
+ ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+ import random
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.python import PythonOperator
+
+ with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily")
as dag:
+ # BAD: random value changes every parse
+ PythonOperator(
+ task_id=f"task_{random.randint(1, 1000)}",
+ python_callable=lambda: None,
+ )
+
+**3. Assigning runtime-varying values to variables used in constructors:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.python import PythonOperator
+
+ # BAD: the variable captures a parse-time value, then is passed to the DAG
+ default_args = {"start_date": datetime.now()}
+
+ with DAG(dag_id="bad_defaults", default_args=default_args,
schedule="@daily") as dag:
+ PythonOperator(task_id="my_task", python_callable=lambda: None)
+
+Even though ``datetime.now()`` is not called directly inside the Dag
constructor, it flows in through
+``default_args`` and still causes a different serialized Dag on every parse.
+
+**4. Using environment variables or file contents that change between parses:**
+
+.. code-block:: python
+
+ import os
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.bash import BashOperator
+
+ with DAG(dag_id="bad_env", start_date="2024-01-01", schedule="@daily") as
dag:
+ # BAD if BUILD_NUMBER changes on every deployment or parse
+ BashOperator(
+ task_id="echo_build",
+ bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
+ )
Review Comment:
```suggestion
BashOperator(
task_id="echo_build",
# BAD if BUILD_NUMBER changes on every deployment or parse
bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
```
##########
airflow-core/docs/faq.rst:
##########
@@ -237,6 +237,166 @@ There are several reasons why Dags might disappear from
the UI. Common causes in
* **Time synchronization issues** - Ensure all nodes (database, schedulers,
workers) use NTP with <1s clock drift.
+.. _faq:dag-version-inflation:
+
+Why does my Dag version keep increasing?
+-----------------------------------------
+
+Every time the Dag processor parses a Dag file, it serializes the Dag and
compares the result with the
+version stored in the metadata database. If anything has changed, Airflow
creates a new Dag version. This
+mechanism ensures that Dag runs use consistent code throughout their
execution, even if the Dag file is
+updated mid-run.
+
+**Dag version inflation** occurs when the version number increases
indefinitely without the Dag author
+making any intentional changes.
+
+What goes wrong
+"""""""""""""""
+
+When Dag versions increase without meaningful changes:
+
+* The metadata database accumulates unnecessary Dag version records,
increasing storage and query overhead.
+* The UI shows a misleading history of Dag changes, making it harder to
identify real modifications.
+* The scheduler and API server may consume more memory as they load and cache
a growing number of Dag versions.
+
+Common causes
+"""""""""""""
+
+Version inflation is caused by using values that change at **parse time** —
that is, every time the Dag
+processor evaluates the Dag file — as arguments to Dag or Task constructors.
The most common patterns are:
+
+**1. Using ``datetime.now()`` or ``pendulum.now()`` as ``start_date``:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+
+ # BAD: datetime.now() produces a different value on every parse
+ with DAG(
+ dag_id="bad_example",
+ start_date=datetime.now(),
+ schedule="@daily",
+ ):
+ ...
+
+Every parse produces a different ``start_date``, so the serialized Dag is
always different from the
+stored version.
+
+**2. Using random values in Dag or Task arguments:**
+
+.. code-block:: python
+
+ import random
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.python import PythonOperator
+
+ with DAG(dag_id="bad_random", start_date="2024-01-01", schedule="@daily")
as dag:
+ # BAD: random value changes every parse
+ PythonOperator(
+ task_id=f"task_{random.randint(1, 1000)}",
+ python_callable=lambda: None,
+ )
+
+**3. Assigning runtime-varying values to variables used in constructors:**
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.python import PythonOperator
+
+ # BAD: the variable captures a parse-time value, then is passed to the DAG
+ default_args = {"start_date": datetime.now()}
+
+ with DAG(dag_id="bad_defaults", default_args=default_args,
schedule="@daily") as dag:
+ PythonOperator(task_id="my_task", python_callable=lambda: None)
+
+Even though ``datetime.now()`` is not called directly inside the Dag
constructor, it flows in through
+``default_args`` and still causes a different serialized Dag on every parse.
+
+**4. Using environment variables or file contents that change between parses:**
+
+.. code-block:: python
+
+ import os
+
+ from airflow.sdk import DAG
+ from airflow.providers.standard.operators.bash import BashOperator
+
+ with DAG(dag_id="bad_env", start_date="2024-01-01", schedule="@daily") as
dag:
+ # BAD if BUILD_NUMBER changes on every deployment or parse
+ BashOperator(
+ task_id="echo_build",
+ bash_command=f"echo {os.environ.get('BUILD_NUMBER', 'unknown')}",
+ )
+
+How to avoid version inflation
+""""""""""""""""""""""""""""""
+
+* **Use fixed ``start_date`` values.** Always set ``start_date`` to a static
``datetime`` literal:
+
+ .. code-block:: python
+
+ import datetime
+
+ from airflow.sdk import DAG
+
+ with DAG(
+ dag_id="good_example",
+ start_date=datetime.datetime(2024, 1, 1),
+ schedule="@daily",
+ ):
+ ...
+
+* **Keep all Dag and Task constructor arguments deterministic.** Arguments
passed to Dag and Operator
+ constructors must produce the same value on every parse. Move any dynamic
computation into the
+ ``execute()`` method or use Jinja templates, which are evaluated at task
execution time rather than
+ parse time.
+
+* **Use Jinja templates for dynamic values:**
+
+ .. code-block:: python
+
+ from airflow.providers.standard.operators.bash import BashOperator
+
+ # GOOD: the template is resolved at execution time, not parse time
+ BashOperator(
+ task_id="echo_date",
+ bash_command="echo {{ ds }}",
+ )
+
+* **Use Airflow Variables with templates instead of top-level lookups:**
+
+ .. code-block:: python
+
+ from airflow.providers.standard.operators.bash import BashOperator
+
+ # GOOD: Variable is resolved at execution time via template
+ BashOperator(
+ task_id="echo_var",
+ bash_command="echo {{ var.value.my_variable }}",
+ )
Review Comment:
```suggestion
BashOperator(
task_id="echo_date",
# GOOD: the template is resolved at execution time, not parse time
bash_command="echo {{ ds }}",
)
* **Use Airflow Variables with templates instead of top-level lookups:**
.. code-block:: python
from airflow.providers.standard.operators.bash import BashOperator
BashOperator(
task_id="echo_var",
# GOOD: Variable is resolved at execution time via template
bash_command="echo {{ var.value.my_variable }}",
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]