potiuk commented on issue #31200: URL: https://github.com/apache/airflow/issues/31200#issuecomment-1610806779
As usual - look at the logs, and your monitoring systems. Quite often there are issues with the deployment and resources that are causing that - incluidng lack of memory, cpu, disk, I/O limitatins. Looking at the logs and analysing if there are any warnings or information that it could indicate that scheduler goes down or that there are errors and warnigns that would indicate abnormal behaviour. Generally if you see any warnings or errors you should make sure to address them - usually you can find the reasons by reading the messages and applying what they say or you cna use airflow docs/google/search the issues and discussions here to find out if others have similar problems. Usually you will find answers, other's people logs and some fix suggestion if people had similar issue. Nothing really special - just the usual way when you run open-source software like that where we prepare documentation and have forums where people share their problems ans solutions. The usual Open-source community. Look at the monitoring of yours - depending on the deployment (like with any other software) you should have some ways to monitor memory, cpu, I/O. other resources and they might show some anomalies that can cause instabilities that will not be visible in airflow logs (because for example you have not enough resources to run airflow and it gets killed externally). Again this is nothing special for airlfow - standard way how any applications would be managed and monitored for your deployment so you can apply techniquest that you usully apply for other apps of yours. Managing the deployment of yours is an importat responsibility of people like you (Deployment Managers) - we are just releasing airflow software, but it's the Deployment Managers who need to manage, monitor, and tune airflow - following the documentation we release together with the software. And it also varies accross the deployment that the Deployment Manager chose - for example. a lot of monitoring and tuning that the Deployment Manager would have to do is handled for the manager by the managed service if you choose to run managed service rather than deploy it on your own. Generally speaking, you have this page https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html which describes what kind of skills and what kind of effort and which part of the deployment is expected from the deployment manager like you depending on the choice of deployment you make. If you are unsure and just learning Airflow, you can play with the allocation of thsoe resources if you have no monitoring in place - changing memory to be bigger, increasing CPU limits etc. This is a valid technique, Airflow has many knobs to turn and many options you can configure also it can run your own code that you provide, which might change Airflows expectations for resources etc. , so it's a very valid technique to try-and-see, rather than "foresee" what kind of resources you might have. In process control theory, it is very good approach when the system has many variables. You go in the loop "guess what can be changed -> change -> observe -> see the impact -> loop back" . That process control with feedback loop. Finally, you can also play with fine-tuning the scheduler - depending on how your Airlfow is deployed, which database, which filesystem what architecture and executor you chose, you have many knobs to turn in the configuration (again it's Deployment's Manager job to fine-tune it to the right configuration. This page https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#fine-tuning-your-scheduler-performance has more detailed explanation on the knobs you can turn, what effects they have and which part of the system you should diagnose and observe in order to make your decisions. I hope it will help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
