Bowrna commented on PR #39398: URL: https://github.com/apache/airflow/pull/39398#issuecomment-2268244502
> > I have been working on this PR after taking a long break due to personal commitments. I have been struck in this PR. From whatever I have learnt so far, I can see that. > > > > 1. there is a configurable param, task_queued_timeout, that allows to configure the time above which the task can not stuck in queue. I see that this timeout is used in `schedule_job_runner.py` and it is invoked using a timer(event_scheduler) that executes the function `_fail_tasks_stuck_in_queued` periodically. > > > > In this function, we pick out the TI from DB that is stuck in the queue and we collect the executor and task instances and ask the executor to cleanup the struck tasks. So the work of cleaning up goes to the executor part and how it wants to handle the cleanup process. So far this cleanup is implemented only in (Celery, CeleryKubernetes, Kubernetes, LocalKubernetes) Executor. > > In case of Celery Executor, it is marked as Failed and task is popped out from active task dict. In case of Kubernetes Executor, it only delete pods that is associated with task instance. In case of CeleryKubernetes, TI is checked if the queue is celery or k8s one and based on it one(or both) of the above two ways of cleanup is invoked. In case of LocalKubernetes, TI is filtered out for k8s and only cleanup associated with k8s executor is invoked, as the local executor doesn't have any cleanup method implementation done. > > I have a question wrt k8s executor, if its stuck in queue for long time and the implementation at cleanup only deletes the pod. I don't see any place where the task is marked as failed ( or any other state). Can anyone help me figure out how this one works? > > cc: @potiuk > > https://github.com/apache/airflow/blob/3805050f34dcb575aaad690c6ad1e37f75f3b2cf/airflow/jobs/scheduler_job_runner.py#L1626-L1660 > > @collinmcnulty this is the reason for enquiring about the type of executor. @potiuk A gentle reminder on this as I am struck here. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
