collinmcnulty opened a new issue, #25254: URL: https://github.com/apache/airflow/issues/25254
### Description Any time an Airflow component changes the state of a task instance, it should record that change in an audit-log-like table of changes. Thus the user will be able to easily see what happened to their tasks. | dag_id | task_id | run_id | map_index | state | time_changed | component_type | component_id | |-------------|--------------------|-------------------|-----------|--------|---------------------|----------------|--------------| | example_dag | config_file_sensor | scheduled_2022... | -1 | queued | 2022-07-25T12:01:01 | scheduler | <uuid> | | example_dag | config_file_sensor | scheduled_2022... | -1 | running | 2022-07-25T12:24:01 | worker | <uuid> | Since task_instance is already one of the biggest tables, this table definitely has the potential to be very big. I think it should probably be off by default with a config flag for turning it on. It seems like it should probably only be used in conjunction with regular runs of `airflow db clean`. ### Use case/motivation Tracing the lifecycle of a task instance across Airflow component logs is quite tedious and involves effectively building the described table in your head or on a notepad. Many times when I'm trying to understand what happened to a task, such investigation is necessary. It would also help answer questions like "which task instances were in [state] at this particular time in the past". ### Related issues #25252 ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
