george-zubrienko commented on issue #30884:
URL: https://github.com/apache/airflow/issues/30884#issuecomment-1524913997

   I remember looking at that commit, but I agree even if it is causing this 
behaviour, this is not an easy guess at all. For example, these are metrics 
from pgbouncer pod that guards our Postgres, when it ran with 2.5.3 - not much 
load at all, and cluster was not scheduling anything, so that load is 
webserver+dagprocessor.
   
   
![image](https://user-images.githubusercontent.com/14901777/234783295-4d2bd6ff-65db-4221-a409-d0be098e69df.png)
   
   In production our xact/s is between ~90 and ~300 depending on the number of 
running tasks.
   
   However, there was one change that stand out, but I could not explain it. 
After the upgrade, these are key metrics from Postgres it self (Azure single 
server btw):
   
   
![image](https://user-images.githubusercontent.com/14901777/234784571-bb808a7f-27c2-4ec2-9731-4960c9b8e463.png)
   
   You can see there is a significant drop in traffic between db and airflow 
cluster after we rolled out 2.5.3. Maybe it is nothing, but just so you have a 
more or less complete picture from our end.
   
   Since we already have an image with 2.5.3, I can add a layer on top with 
modified `manager.py` and use that image for dag processor specifically - that 
way we would test exactly the mentioned commit. I'll try to find time today or 
Friday to do such test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to