potiuk commented on issue #56959: URL: https://github.com/apache/airflow/issues/56959#issuecomment-3427305580
Without any more details - it's impossible to help or respond in any other way than "well, something is wrong". This might be anything - includig some resource consumption - lack of memory, sockets, etc. I think some kind of debugging of what the process is stopped there is no way it can be diagnosed. My proposal - try to capture when it happens and try to diagnose the process itself. There are a number of utils that can usually be used this thread in stack overflow has a number of tools https://stackoverflow.com/questions/3443607/how-can-i-tell-where-my-python-script-is-hanging Also your k8s and monitoring might be useful (I guess you have some monitoriing) , showing excessive usage of resources, If we can get any feedback, then we might be able to help with diagnosis. As a workaround in the meantime- generally dag file processor can be safely restarted any time - you can empioy a health check to check for processing progress and restart it if it fails. Eventually - migration to Airflow 3 is strongly recommended. Standalone dag processor is the only option there and it solves many more issues and architectural defficiencies of Airflow 2 - and migration might take less time for you than diagnosis of the current state (also there will be very limited work on Airflow 2, most likely most of the response you will get - unless there are very clear and easy to diagnose isssues - will be "migrate to Airflow 3". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
