karenbraganz commented on issue #52711:
URL: https://github.com/apache/airflow/issues/52711#issuecomment-3030015203

   I have seen this happen before when a temporary DAG deactivation occurred 
shortly before the dataset update. The dataset got updated, so you see a 
timestamp, but the DAG was deactivated during the update, so it did not count 
towards the next DAG run. The DAG got activated again shortly after, so the 
deactivation was not obvious outside of the logs.
   
   In order to confirm whether this is happening in your case, I recommend 
taking these steps:
   1. Dump all data from the `dataset_dag_run_queue`, `dataset_event`, and 
`dataset` tables in the metadata DB. It is necessary to do this while the issue 
is actively occurring because `dataset_dag_run_queue` only stores data 
temporarily. 
   2. From `dataset_dag_run_queue`, identify which dataset updates are in queue 
for the DAG run. You may run the below SQL query and use the dataset_id value 
to identify the datasets from the `dataset` table. From this, you can see which 
dataset is missing.
   ```
   SELECT * FROM dataset_dag_run_queue
   WHERE target_dag_id = '<your-dag-id>';
   ```
   3. Confirm the latest update timestamp for the missing dataset from the 
`dataset_event` table with the below SQL query. This timestamp will likely 
match the timestamp you see on the UI.
   ```
   SELECT * FROM dataset_event
   WHERE dataset_id = '<missing-dataset-id>'
   ORDER BY timestamp DESC;
   ```
   5. Check your scheduler logs (or DAG processor logs if you have a standalone 
DAG processor) around the time of the missing dataset update timestamp. Do you 
see any DAG deactivations or other issues? 
   
   Alternatively, if you do not want to dump and analyze data from the DB, you 
could check the scheduler/ DAG processor logs before each timestamp shown in 
the UI for DAG deactivations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to