eejbyfeldt commented on issue #30146: URL: https://github.com/apache/airflow/issues/30146#issuecomment-1472735826
That change does not look correct to me. The current condition that determines if the on_failure callback is called is https://github.com/apache/airflow/blob/699171679a79917e9c2ee798f71a01d3051f3cc4/airflow/models/taskinstance.py#L1895 So the correct condition for deciding the callback would need to replicate that logic? Also not reraising the exception seems very problematic to me. The reason it currently called twice is that after reraiseing we end up here https://github.com/apache/airflow/blob/699171679a79917e9c2ee798f71a01d3051f3cc4/airflow/models/taskinstance.py#L1465-L1479 which handle updating the state and running the failure fallback. Just swallowing the exception like in the patch you posted will result in the task being incorrectly marked as success. Seems to me the reason for us not calling the `on_failure_callback` on cases where the task was externally marked as success/failed is this branch: https://github.com/apache/airflow/blob/699171679a79917e9c2ee798f71a01d3051f3cc4/airflow/models/taskinstance.py#L1468-L1475 so it seems like the correct fix would be to call the callbacks from there. Because that is the bug that PR was aiming to solve right? Or did I misunderstand the goal? It seems incorrect to me that assuming SIGTERM is the same thing being marked failed externally. As SIGTERM can be received for other reason, like the pod-preemption when running on k8s, and for such cases we want to go through the normal failure handling code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
