peloyeje opened a new pull request, #40891:
URL: https://github.com/apache/airflow/pull/40891

   Fix based on a real world issue seen in production where a failing pod does 
not make the associated task fail
   
   Running theory: then a pod fails while in 
`self.pod_manager.fetch_container_logs`, `running` property of the returned 
`pod_log_status` object is False, hence we skip the deferrable call and jump 
directly to `self._clean`
   But the issue is that the `event` object is never refreshed and still 
carries the `running` status, hence hitting this code path:
   
   
https://github.com/apache/airflow/blob/4cbfcd72a27086485682a93ed7be8ef75ecfde88/airflow/providers/cncf/kubernetes/operators/pod.py#L793-L794
   
   and making the task instance returns without error
   
   Proposed fix: call defer whatever the pod status is after fetching logs, so 
that the fail status is picked up during the next trigger run
   It adds a bit of delay to the pod completion detection but is simple/stupid 
:) 
   
   
https://github.com/apache/airflow/blob/98c5a3a2c6d1df722d56bb3748dfbc810d5952aa/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L397
 is running
   
   <!-- Please keep an empty line above the dashes. -->
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to