potiuk commented on PR #41870: URL: https://github.com/apache/airflow/pull/41870#issuecomment-2323093079
Also another question: https://airflow.apache.org/docs/apache-airflow-providers-celery/stable/configurations-ref.html#result-backend-sqlalchemy-engine-options -> there is also a possible pool configuration, so celery backend can use pooling. As you mentioned NullPool is used only when process is not forked. But.... what does it mean that process is not forked and is the case in case of task submitssion and checking result? Again - I am not that deep into celery interface - but I think the code you pointed at is called only on the side of worker (i.e. when worker wants to write status to the result backend) but it's not the same code that is used on a client side when the tasks are submitted and queried for status. Maybe there is another reason why in your case pooling connections are not used. Also - another question - are you using pgbouncer for your result backend? Because, I believe this might be the actual root cause of the overhead. It's indded quite slow to open and close a connection to a postgres database directly - because it needs to fork a new process and reserve so memory - but if you have pgbouncer on a local nettwork, opening and closing a connection to the bgbounder instance should be super-quick - because pgbouncer does "real" DB connection pooling in this case. And PGBouncer is absolutely recommended for all "serious" Airlfow installations on Postgres. I am not against this change, however I think we need to understand better where the "huge" inefficiency you noticed comes from - I'd really find it quite strange to see it after so many years of Airlfow using celery in multiple - even huge - installation for it to be unnoticed, so I suspect what you see is a result of some environmental setup that simply "boosts" the hotness of that part artifficially. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
