potiuk commented on issue #31200:
URL: https://github.com/apache/airflow/issues/31200#issuecomment-1610806779

   As usual - look at the logs, and your monitoring systems. Quite often there 
are issues with the deployment and resources that are causing that - incluidng 
lack of memory, cpu, disk, I/O limitatins. 
   
   Looking at the logs and analysing if there are any warnings or information 
that it could indicate that scheduler goes down or that there are errors and 
warnigns that would indicate abnormal behaviour. Generally if you see any 
warnings or errors you should make sure to address them - usually you can find 
the reasons by reading the messages and applying what they say or you cna use 
airflow docs/google/search the issues and discussions here to find out if 
others have similar problems. Usually you will find answers, other's people 
logs and some fix suggestion if people had similar issue.
   
   Nothing really special - just the usual way when you run open-source 
software like that where we prepare documentation and have forums where people 
share their problems ans solutions. The usual Open-source community.
   
   Look at the monitoring of yours - depending on the deployment (like with any 
other software) you should have some ways to monitor memory, cpu, I/O. other 
resources and they might show some anomalies that can cause instabilities that 
will not be visible in airflow logs (because for example you have not enough 
resources to run airflow and it gets killed externally). Again this is nothing 
special for airlfow - standard way how any applications would be managed and 
monitored for your deployment 
   so you can apply techniquest that you usully apply for other apps of yours. 
Managing the deployment of yours is an importat responsibility of people like 
you (Deployment Managers) - we are just releasing airflow software, but it's 
the Deployment Managers who need to manage, monitor, and tune airflow - 
following the documentation we release together with the software.  And it also 
varies accross the deployment that the Deployment Manager chose - for example. 
a lot of monitoring and tuning that the Deployment Manager would have to do is 
handled for the manager by the managed service if you choose to run managed 
service rather than deploy it on your own.  Generally speaking, you have this 
page 
https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html 
which describes what kind of skills and what kind of effort and which part of 
the deployment is expected from the deployment manager like you depending on 
the choice of deployment you make.
   
   If you are unsure and just learning Airflow, you can play with the 
allocation of thsoe resources if you have no monitoring in place - changing 
memory to  be bigger, increasing CPU limits etc. This is a valid technique, 
Airflow has many knobs to turn and many options you can configure also it can 
run your own code that you provide, which might change Airflows expectations 
for resources etc. , so it's a very valid technique to try-and-see, rather than 
"foresee" what kind of resources you might have. In process control theory, it 
is very good approach when the system has many variables. You go in the loop 
"guess what can be changed -> change -> observe -> see the impact -> loop back" 
. That process control with feedback loop.
   
   Finally, you can also play with fine-tuning the scheduler - depending on how 
your Airlfow is deployed, which database, which filesystem what architecture 
and executor you chose, you have many knobs to turn in the configuration (again 
it's Deployment's Manager job to fine-tune it to the right configuration. This 
page 
https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#fine-tuning-your-scheduler-performance
 has more detailed explanation on the knobs you can turn, what effects they 
have and which part of the system you should diagnose and observe in order to 
make your decisions.
   
   I hope it will help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to