lsenjov opened a new pull request #19949: URL: https://github.com/apache/airflow/pull/19949
Background: We run a lot of dags/tasks (~600 dags with 50 tasks in each) on kubernetes, and the scheduler tends to crash from so many. Adoption is common. On adoption, it would patch the pod, but when a specific field would get re-written (I'm not sure which, sorry) that would cause a new pod to be created instead. This new pod would then change the status of the task instance, causing the pod to die with SIGTERM, and the task to fail. This patch asks it to only update the label, which has removed the symptom (from 300+ daily failures to 0). I only needed to do this for `adopt_launched_task`, but also did it for `adopt_finished_task` for consistency. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
