akomisarek opened a new issue, #28342:
URL: https://github.com/apache/airflow/issues/28342

   ### Apache Airflow version
   
   2.5.0
   
   ### What happened
   
   The _updated_at_ for **Datasets** is automatically updated every 30 secs/or 
so despite no changes/executions to the dags. 
   
    
   
   ### What you think should happen instead
   
   The **Datasets** _updated_at_ column should be updated only when some 
modifications to the Dataset happen, IDEALLY when the Dataset is updated by 
producing tasks. 
   
   If the column has a different meaning, maybe another column should provide 
that information? I.e. to determine the "freshness" of the dataset. 
   
   
   ### How to reproduce
   
   1. Install fresh Airflow instance 
(https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html)
   2. Enable 
[dataset_produces_1](http://localhost:8080/dags/dataset_produces_1/grid) DAG
   3. Wait for execution of dataset_produces_1 is finished
   4. Invoke `curl --location --request GET 
'localhost:8080/api/v1/datasets/s3%3A%2F%2Fdag1%2Foutput_1.txt' --header 
'Authorization: Basic YWlyZmxvdzphaXJmbG93'`
   5. Note updated_at
   6. Rerun the curl after one minute or so
   7. Observe new updated_at
   
   The changes can be observed as well via connecting to airflow DB directly 
and running `select * from dataset`, as well, but that requires exposing 
PostgreSQL service so it's more difficult to setup. 
   
   I have a hunch that steps 2/3 might not even be required. 
   
   
   
   ### Operating System
   
   macOS Monterey 12.6.1
   
   ### Versions of Apache Airflow Providers
   
   N/A - using basic Airflow installation via docker
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   N/A - basic setup following the quickstart.
   
   ### Anything else
   
   Context to my request
   
   We have a multi-setup Airflow environment and need to set up some 
dependencies between them. I think the Dataset approach and custom Sensor to 
query API is promising, but the results of querying API are not expected. 
Ideally, there should be correct _updated_at_ information on the **Dataset** 
level. 
   
   Currently, the only place where I seem to be able to get the information I 
am looking for is another API: `localhost:8080/api/v1/datasets/events`, but it 
doesn't seem to allow filtering by the dataset_uri, which seems weird! (is it 
considered?) 
   
   
   
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to