potiuk commented on issue #25021:
URL: https://github.com/apache/airflow/issues/25021#issuecomment-1187750314

   Is this happening for all tasks that are running when scheduler was 
restarted or only some of them ? Any more information /guesses of which tasks 
might be affected ? Also what kind of deployment you have - Kubernetes 
Executor? Kubernetes Celery Executor? Can you please elaborate more on that
   
   BTW. Why do you have such frequent restarts of PGSQL/Kubnernetes API ? this 
seems like something you shoudl address in general. This is very likely that 
the problem is not because if you have flakiness in DB and Kubernetes APIs, 
there is no way Airflow implementation can cover it up. Airlfow uses PRECISELY 
DB and Kubernetes APIs to store information and query for it so that it is able 
to recover from any kind of crashes, so if your flakiness is there in either of 
those, it might basically mean that Airlfow will not be able to recover 
(because the flakiness prevented it from storing the information necessary or 
querying it).
   
   This is rather unsolvable conundrum - we actually rely on stable deployment 
underneath. And I believe it should be handled first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to