stijndehaes commented on issue #17370:
URL: https://github.com/apache/airflow/issues/17370#issuecomment-895995979


   > > > > May be related with: #16625
   > > > 
   > > > 
   > > > If so, then it's resolved in #16301 which will be released in 2.1.3
   > > 
   > > 
   > > @ephraimbuddy I don't think that #16301 solves #16625, the code in that 
PR is only ran when the container has been started. But the issue in #16625 
happens when a pod has never started but did fail. I added some more info on 
the issue #16625 maybe this sheds a bit more light on what might be going on.
   > 
   > Oh, I see, we even had an issue with task being stuck in queued because a 
POD had an error starting or something else happened, and the executor report 
that the task has failed while the scheduler still sees it as queued. We made 
this change #15929 which has not been released to resolve the task getting 
stuck.
   > I will take a closer look and see if we can make this work properly
   
   Ah I thought this code was already active in our environment but it isn't 😅 
If you could look into making this more robust that would be great. If I can 
provide more info or help please let me know


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to