karenbraganz commented on issue #52711: URL: https://github.com/apache/airflow/issues/52711#issuecomment-3030015203
I have seen this happen before when a temporary DAG deactivation occurred shortly before the dataset update. The dataset got updated, so you see a timestamp, but the DAG was deactivated during the update, so it did not count towards the next DAG run. The DAG got activated again shortly after, so the deactivation was not obvious outside of the logs. In order to confirm whether this is happening in your case, I recommend taking these steps: 1. Dump all data from the `dataset_dag_run_queue`, `dataset_event`, and `dataset` tables in the metadata DB. It is necessary to do this while the issue is actively occurring because `dataset_dag_run_queue` only stores data temporarily. 2. From `dataset_dag_run_queue`, identify which dataset updates are in queue for the DAG run. You may run the below SQL query and use the dataset_id value to identify the datasets from the `dataset` table. From this, you can see which dataset is missing. ``` SELECT * FROM dataset_dag_run_queue WHERE target_dag_id = '<your-dag-id>'; ``` 3. Confirm the latest update timestamp for the missing dataset from the `dataset_event` table with the below SQL query. This timestamp will likely match the timestamp you see on the UI. ``` SELECT * FROM dataset_event WHERE dataset_id = '<missing-dataset-id>' ORDER BY timestamp DESC; ``` 5. Check your scheduler logs (or DAG processor logs if you have a standalone DAG processor) around the time of the missing dataset update timestamp. Do you see any DAG deactivations or other issues? Alternatively, if you do not want to dump and analyze data from the DB, you could check the scheduler/ DAG processor logs before each timestamp shown in the UI for DAG deactivations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
