akomisarek opened a new issue, #28342:
URL: https://github.com/apache/airflow/issues/28342
### Apache Airflow version
2.5.0
### What happened
The _updated_at_ for **Datasets** is automatically updated every 30 secs/or
so despite no changes/executions to the dags.
### What you think should happen instead
The **Datasets** _updated_at_ column should be updated only when some
modifications to the Dataset happen, IDEALLY when the Dataset is updated by
producing tasks.
If the column has a different meaning, maybe another column should provide
that information? I.e. to determine the "freshness" of the dataset.
### How to reproduce
1. Install fresh Airflow instance
(https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html)
2. Enable
[dataset_produces_1](http://localhost:8080/dags/dataset_produces_1/grid) DAG
3. Wait for execution of dataset_produces_1 is finished
4. Invoke `curl --location --request GET
'localhost:8080/api/v1/datasets/s3%3A%2F%2Fdag1%2Foutput_1.txt' --header
'Authorization: Basic YWlyZmxvdzphaXJmbG93'`
5. Note updated_at
6. Rerun the curl after one minute or so
7. Observe new updated_at
The changes can be observed as well via connecting to airflow DB directly
and running `select * from dataset`, as well, but that requires exposing
PostgreSQL service so it's more difficult to setup.
I have a hunch that steps 2/3 might not even be required.
### Operating System
macOS Monterey 12.6.1
### Versions of Apache Airflow Providers
N/A - using basic Airflow installation via docker
### Deployment
Docker-Compose
### Deployment details
N/A - basic setup following the quickstart.
### Anything else
Context to my request
We have a multi-setup Airflow environment and need to set up some
dependencies between them. I think the Dataset approach and custom Sensor to
query API is promising, but the results of querying API are not expected.
Ideally, there should be correct _updated_at_ information on the **Dataset**
level.
Currently, the only place where I seem to be able to get the information I
am looking for is another API: `localhost:8080/api/v1/datasets/events`, but it
doesn't seem to allow filtering by the dataset_uri, which seems weird! (is it
considered?)
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]