dirrao commented on PR #32249:
URL: https://github.com/apache/airflow/pull/32249#issuecomment-1614196394
Hi @dstandish,
Problem:
When scheduler creates a worker pod for the task, it attaches a label to the
pod. This label is airflow-worker=<scheduler_job_id>. This label is a unique
identifier that indicates which scheduler is tracking this worker pod.
Each scheduler will keep listening for events for only pods that it started.
So for watching the pod events, it uses kubernetes watch api. This watch is
done only for the condition: label_selector: airflow-worker=<my_job_id> So
because of this each scheduler will listen to their own pod events.
So now the following is happening:
1. Lets say scheduler1 with id 1 started a task in worker pod with label
`airflow-worker=1` and its heartbeat got delayed.
2. Now scheduler2 with id 2 thinks that scheduler1 is dead because its
heartbeat was not received on time (even though scheduler 1 is alive).
3. So scheduler2 will adopt tasks of scheduler1. This means scheduler2 will
update the label for the pod with scheduler2's job id i.e label is now updated
from airflow-worker=1 to airflow-worker=2.
4. In the case of label updation, Kubernetes watch API is sending a DELETE
event on airflow-worker=1 and ADDED event on airflow-worker=2 .
5. As scheduler1 is still alive, it will keep listening to all events with
label airflow-worker=1. It gets the event as DELETED as explained in point 4.
So scheduler1 thinks that POD is deleted while it was in running phase and goes
ahead and runs the task failure scenario. One of the step in this is to do pod
cleanup which is to delete the pod to avoid dangling pods. So it sends a delete
request to kubernetes API.
Solution: Change the airflow Kubernetes watch label selector filter from
kwargs = {"label_selector": f"airflow-worker={scheduler_job_id}"} to kwargs =
{"label_selector": "airflow-worker"} and then filter the events in scheduler by
airflow-worker=<my_job_id>.
QA: I have updated the test case to ensure the existing functionality.
However, I am not sure how to write the test cases in case multi scheduler. Can
you share references to it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]