Asquator commented on issue #26760: URL: https://github.com/apache/airflow/issues/26760#issuecomment-3808123412
Did some research on this, here is what happens exactly. The context you see in DAG-level callbacks is constructed here: https://github.com/apache/airflow/blob/056e24e023a32dbcd5d0be9da45dc4eede770916/airflow-core/src/airflow/dag_processing/processor.py#L324-L342 In normal case, the task details you see in context are of `last_ti` coming from scheduler's request: https://github.com/apache/airflow/blob/056e24e023a32dbcd5d0be9da45dc4eede770916/airflow-core/src/airflow/dag_processing/processor.py#L327-L334 https://github.com/apache/airflow/blob/056e24e023a32dbcd5d0be9da45dc4eede770916/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L2199-L2211 The `last_ti` is computed here: https://github.com/apache/airflow/blob/056e24e023a32dbcd5d0be9da45dc4eede770916/airflow-core/src/airflow/models/dagrun.py#L1414-L1431 Which just selects the last task in lexicographical order from the entire DAG. I could reproduce it by creating a DAG with several tasks `a`, `b`, `c`,... and no matter which task actually failed the execution or what order between the tasks was, the last one alphabetically was passed to the callback context. I think we need to change this behavior to include a task or multiple tasks in failed state from the DAG, or just pass nothing. Actually, if you wanted to know what task failed, you'd probably use task-level callbacks injected with `default_args`, so passing no task info to DAG-level callbacks is pretty logical here, though backcomp breaks this way. I want to submit a PR to fix this and https://github.com/apache/airflow/issues/61119. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
