ephraimbuddy opened a new pull request #15336:
URL: https://github.com/apache/airflow/pull/15336
Currently, when a container inside a pod terminates, airflow doesn't know
about it and
tasks remain queued. The kubernetes Job Watcher does not watch the status
of containers
inside pods. It only watches the pod and report the pod's status to airflow.
From kubernetes doc, the pending phase of a pod is defined to include the
time a Pod spends waiting to be scheduled as well as
the time spent downloading container images over the network.
Network failure can crash the container while the pod remains Pending until
a certain
time before it's deleted.
This PR fixes this by including watching of containers in kubernetes job
watcher's job
This should close https://github.com/apache/airflow/issues/13542 and
https://github.com/apache/airflow/issues/15218 hopefully.
And I prefer it to timing out.
cc: @jedcunningham
---
**^ Add meaningful description above**
Read the **[Pull Request
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
for more information.
In case of fundamental code change, Airflow Improvement Proposal
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party
License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]