Gollum999 opened a new issue #22133: URL: https://github.com/apache/airflow/issues/22133
### Apache Airflow version 2.2.3 ### What happened When triggering a DAG manually (via the web or via `airflow dags trigger`), some template params like `ds`, `ts`, and others derived from `dag_run.logical_date` will be set to the specified execution timestamp. This is inconsistent with automated runs where those fields are set to `data_interval_start`. This behavior contradicts the documentation in a few places, and can cause tasks that depend on those template params to behave unintuitively. ### What you expected to happen I expected `ds` to always equal `data_interval_start`. Quoting the docs in a few different places (emphasis mine): [DAG Runs: Data Interval](https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html#data-interval) > The “logical date” (also called `execution_date` in Airflow versions prior to 2.2) of a DAG run, for example, **denotes the start of the data interval**, not when the DAG is actually executed. [FAQ: What does `execution_date` mean?](https://airflow.apache.org/docs/apache-airflow/stable/faq.html#what-does-execution-date-mean) > Note that `ds` (**the YYYY-MM-DD form of `data_interval_start`**) refers to date *string*, not date *start* as may be confusing to some. However, it's worth noting that [DAGs: Running DAGs](https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#running-dags) *does* seem to explain this edge case: > For example, if a DAG run is manually triggered by the user, its logical date would be the date and time of which the DAG run was triggered, and the value should be equal to DAG run’s start date. However, when the DAG is being automatically scheduled, with certain schedule interval put in place, the logical date is going to indicate the time at which it marks the start of the data interval, where the DAG run’s start date would then be the logical date + scheduled interval. ### How to reproduce Example DAG: ``` #!/usr/bin/env python3 from datetime import datetime from airflow import DAG from airflow.operators.bash import BashOperator default_args = { 'retries': 0, } with DAG( 'test_dag', default_args=default_args, schedule_interval='@weekly', start_date=datetime(2022, 1, 1), catchup=False, ) as dag: BashOperator(task_id='task', bash_command="""echo " ds: {{ ds }} prev_ds: {{ prev_ds }} next_ds: {{ next_ds }} ts: {{ ts }} execution_date: {{ execution_date }} data_interval_start: {{ data_interval_start }} data_interval_end: {{ data_interval_end }} dag_run.logical_date {{ dag_run.logical_date }} " """) ``` Trigger this dag via web or via `airflow dags trigger test_dag -e <some timestamp>`, then look at output in the logs. Example output for an automated run: ``` [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - ds: 2022-02-27 [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - prev_ds: 2022-02-20 [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - next_ds: 2022-03-06 [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - ts: 2022-02-27T00:00:00+00:00 [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - execution_date: 2022-02-27T00:00:00+00:00 [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - data_interval_start: 2022-02-27T00:00:00+00:00 [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - data_interval_end: 2022-03-06T00:00:00+00:00 [2022-03-08, 10:31:21 CST] {subprocess.py:89} INFO - dag_run.logical_date 2022-02-27 00:00:00+00:00 ``` Example output for a manually-triggered run: ``` [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - ds: 2022-03-08 [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - prev_ds: 2022-03-08 [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - next_ds: 2022-03-08 [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - ts: 2022-03-08T22:23:58+00:00 [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - execution_date: 2022-03-08T22:23:58+00:00 [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - data_interval_start: 2022-02-27T00:00:00+00:00 [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - data_interval_end: 2022-03-06T00:00:00+00:00 [2022-03-08, 10:31:27 CST] {subprocess.py:89} INFO - dag_run.logical_date 2022-03-08 22:23:58+00:00 ``` ### Operating System CentOS 7.4 ### Versions of Apache Airflow Providers Only the defaults. ### Deployment Other ### Deployment details Just running processes locally. ### Anything else I'm not convinced that this is just a documentation issue; the fact that `logical_date` and all derived fields can have contextually different meanings seems fundamentally broken to me. To keep my users from running into issues, I feel like I am forced to teach them either "never use `ds`/`ts`/etc." or "never trigger DAGs manually", neither of which feels great. As far as I can tell, there is no way to manually trigger a dag and have it behave exactly like a "normal" automated run since `ds` will always fall outside of the data interval. Which begs the question: What does it even mean to manually trigger a DAG Run when data intervals are involved? It shouldn't be able to affect the existing schedule, so the current behavior of "snapping" to the latest complete data interval makes sense to me. But for consistency, I think all `dag_run` fields (except for things like `run_id`) should follow that same behavior. Alternatively, maybe there are two classes of DAGs: Ones that operate on data intervals, and ones that operate on a single instant in time (e.g. `schedule_interval=None`). And perhaps the former should never be manually triggered and should only ever use something like `airflow dags backfill` to run specific intervals. And ideally the web and CLI would reflect this to prevent running a DAG "the wrong way". Admittedly I am new to Airflow, so maybe my intuitions are not correct. And I recognize that there are almost certainly some users that depend on the current behavior, so it would definitely be a pain to change. But I'm curious to hear if other people have thoughts about this or specific examples of why the current behavior is desirable. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
