george-zubrienko commented on issue #30884: URL: https://github.com/apache/airflow/issues/30884#issuecomment-1524913997
I remember looking at that commit, but I agree even if it is causing this behaviour, this is not an easy guess at all. For example, these are metrics from pgbouncer pod that guards our Postgres, when it ran with 2.5.3 - not much load at all, and cluster was not scheduling anything, so that load is webserver+dagprocessor.  In production our xact/s is between ~90 and ~300 depending on the number of running tasks. However, there was one change that stand out, but I could not explain it. After the upgrade, these are key metrics from Postgres it self (Azure single server btw):  You can see there is a significant drop in traffic between db and airflow cluster after we rolled out 2.5.3. Maybe it is nothing, but just so you have a more or less complete picture from our end. Since we already have an image with 2.5.3, I can add a layer on top with modified `manager.py` and use that image for dag processor specifically - that way we would test exactly the mentioned commit. I'll try to find time today or Friday to do such test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
