potiuk commented on PR #41870:
URL: https://github.com/apache/airflow/pull/41870#issuecomment-2323093079

   Also another question: 
   
   
https://airflow.apache.org/docs/apache-airflow-providers-celery/stable/configurations-ref.html#result-backend-sqlalchemy-engine-options
 -> there is also a possible pool configuration, so celery backend can use 
pooling. As you mentioned NullPool is used only when process is not forked. 
But.... what does it mean that process is not forked and is the case in case of 
task submitssion and checking result? 
   
   Again - I am not that deep into celery interface -  but I think the code you 
pointed at is called only on the side of worker (i.e. when worker wants to 
write status to the result backend) but it's not the same code that is used on 
a client side when the tasks are submitted and queried for status. Maybe there 
is another reason why in your case pooling connections are not used. 
   
   Also - another question - are you using pgbouncer for your result backend? 
Because, I believe this might be the actual root cause of the overhead. It's 
indded quite slow to open and close a connection to a postgres database 
directly - because it needs to fork a new process and reserve so memory - but 
if you have pgbouncer on a local nettwork, opening and closing a connection to 
the bgbounder instance should be super-quick - because pgbouncer does "real" DB 
connection pooling in this case. And PGBouncer is absolutely recommended for 
all "serious" Airlfow installations on Postgres.
   
   I am not against this change, however I think we need to understand better 
where the "huge" inefficiency you noticed comes from - I'd really find it quite 
strange to see it after so many years of Airlfow using celery in multiple - 
even huge - installation for it to be unnoticed, so I suspect what you see is a 
result of some environmental setup that simply "boosts" the hotness of that 
part artifficially.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to