Re: [PR] Fix: Reuse ProcessPoolExecutor in CeleryExecutor [airflow]

via GitHub Thu, 09 May 2024 22:14:56 -0700


potiuk commented on PR #39484:
URL: https://github.com/apache/airflow/pull/39484#issuecomment-2103879926


   > At the beginning, there may be no obvious difference in the task 
throughput of the two implementation solutions. After a long time (such as 1 
day), the throughput of the existing implementation solution will gradually 
decrease. What I compare is the stable result.
   
   I cannot think of a single reason (on the cliend side) why it would happen 
and why changing to a long running pool would change it. The way pool context 
manager works is that it will wait until all processes started in the pool 
complete their tasks and close the pool including all the processes freeing all 
the resources. So there is no rason why "airflow" would slow down.
   
   However maybe that is a problem of your firewall/networking/rabbitmq wrong 
behaviour. The main difference between long running pool of processes and 
processes started temporarily to send the tasks, is that the long running 
processes **might**. (depending on implementation of the way client API works) 
reuse an open connection to the broker to send the tasks rather than open a new 
one. But if rabbitmq server is implemented properly, then it should have no 
effect of getting longer and longer over time, because after closing the 
processes sending the tasks, rabbitmq should free all the resources on the 
server side. 
   
   So maybe the problem is a that your rabbitmq leaks resources when processes 
sending tasks to it are closing down? Do you have some monitoring / can you 
please provide some data to back-up the statement that rabbitmq is actually 
leaking resources in this case? I think if airflow gets slower and slower over 
time, you should be able to see some resources leaking on either side - memory, 
CPU being the most likely candidates and my hypothesis is that it's rabbitmq 
misbehaving (which might be for example a known bug in some old version of 
rabbitmq.  Quick search reveals similar behaviour observed 
https://groups.google.com/g/rabbitmq-users/c/v630G6OCxuU in some old versions 
of rabbitmq (but I have not looked in details of that conversation, it's just 
likely that it might be something similar). Can you please take a close look at 
the rabbitmq side of yours and see if you can observe some resource leaks when 
you go back to the current solution and maybe upgrade rabbitmq to latest 
versions 
 to exclude the possibility it is some old bug? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix: Reuse ProcessPoolExecutor in CeleryExecutor [airflow]

Reply via email to