ephraimbuddy opened a new pull request #15336:
URL: https://github.com/apache/airflow/pull/15336


   Currently, when a container inside a pod terminates, airflow doesn't know 
about it and
    tasks remain queued. The kubernetes Job Watcher does not watch the status 
of containers
   inside pods. It only watches the pod and report the pod's status to airflow.
   
   From kubernetes doc, the pending phase of a pod is defined to include the
   time a Pod spends waiting to be scheduled as well as
   the time spent downloading container images over the network.
   
   Network failure can crash the container while the pod remains Pending until 
a certain
   time before it's deleted.
   
   This PR fixes this by including watching of containers in kubernetes job 
watcher's job
   
   This should close https://github.com/apache/airflow/issues/13542 and 
https://github.com/apache/airflow/issues/15218 hopefully.
   And I prefer it to timing out.
   
   cc: @jedcunningham 
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to