potiuk commented on issue #56959:
URL: https://github.com/apache/airflow/issues/56959#issuecomment-3427305580

   Without any more details - it's impossible to help or respond in any other 
way than "well, something is wrong". 
   
   This might be anything - includig some resource consumption - lack of 
memory, sockets, etc.
   
   I think some kind of debugging of what the process is stopped there is no 
way it can be diagnosed.
   
   My proposal - try to capture when it happens and try to diagnose the process 
itself. There are a number of utils that can usually be used this thread in 
stack overflow has a number of tools 
https://stackoverflow.com/questions/3443607/how-can-i-tell-where-my-python-script-is-hanging
   
   Also your k8s and monitoring might be useful (I guess you have some 
monitoriing) , showing excessive usage of resources,  If we can get any 
feedback, then we might be able to help with diagnosis.
   
   As a workaround in the meantime- generally dag file processor can be safely 
restarted any time - you can empioy a health check to check for processing 
progress and restart it if it fails. 
   
   Eventually - migration to Airflow 3 is strongly recommended. Standalone dag 
processor is the only option there and it solves many more issues and 
architectural defficiencies of Airflow 2 - and migration might take less time 
for you than diagnosis of the current state (also there will be very limited 
work on Airflow 2, most likely most of the response you will get - unless there 
are very clear and easy to diagnose isssues - will be "migrate to Airflow 3". 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to