potiuk commented on PR #39484: URL: https://github.com/apache/airflow/pull/39484#issuecomment-2103879926
> At the beginning, there may be no obvious difference in the task throughput of the two implementation solutions. After a long time (such as 1 day), the throughput of the existing implementation solution will gradually decrease. What I compare is the stable result. I cannot think of a single reason (on the cliend side) why it would happen and why changing to a long running pool would change it. The way pool context manager works is that it will wait until all processes started in the pool complete their tasks and close the pool including all the processes freeing all the resources. So there is no rason why "airflow" would slow down. However maybe that is a problem of your firewall/networking/rabbitmq wrong behaviour. The main difference between long running pool of processes and processes started temporarily to send the tasks, is that the long running processes **might**. (depending on implementation of the way client API works) reuse an open connection to the broker to send the tasks rather than open a new one. But if rabbitmq server is implemented properly, then it should have no effect of getting longer and longer over time, because after closing the processes sending the tasks, rabbitmq should free all the resources on the server side. So maybe the problem is a that your rabbitmq leaks resources when processes sending tasks to it are closing down? Do you have some monitoring / can you please provide some data to back-up the statement that rabbitmq is actually leaking resources in this case? I think if airflow gets slower and slower over time, you should be able to see some resources leaking on either side - memory, CPU being the most likely candidates and my hypothesis is that it's rabbitmq misbehaving (which might be for example a known bug in some old version of rabbitmq. Quick search reveals similar behaviour observed https://groups.google.com/g/rabbitmq-users/c/v630G6OCxuU in some old versions of rabbitmq (but I have not looked in details of that conversation, it's just likely that it might be something similar). Can you please take a close look at the rabbitmq side of yours and see if you can observe some resource leaks when you go back to the current solution and maybe upgrade rabbitmq to latest versions to exclude the possibility it is some old bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
