potiuk commented on PR #33117: URL: https://github.com/apache/airflow/pull/33117#issuecomment-1677144580
Can you perform some calculations and realistic simulations showing how this would help to achieve better performance? I think there is a big risks that your idea of horizontal scalability is there to implement the idea of horizontal scalability. But whether it solves any problem and allows to achieve some use cases and getting things better and more performant? With S3 and celery as broker of the messages, I am not sure if the overhead connected with it would justify potential gains. Only some realistic case and performance benchmarks could justify it. Also if you want to propose it, you have to consider alternatives and see how they compare - both performance wise and complexity-wise. And it's not an abstract ask. This is precisely what we've done when we've implemented Horizontal scalability for Scheduler. If you read that post https://www.astronomer.io/blog/airflow-2-scheduler/ which summarizes all the effort done there. It's just an icing on the cake that is a result of a numerous discussion we had on the devlist, discussing draft Airflow Improvement Proposals, and when we gone through prototype code walkthroughs with Ash where he explained us how the idea works. When you look at the blog post - it explains why we implemented it and why we made the decisions - it was based on some observations and benchmarks that we have a real bottleneck on some realistic cases, then prototyping was done and and benchmarking on several approaches we could take, finally the currrent "Database locking" mechanism with SKIP_LOCKED was chosen based on those benchmarks and analysis as a good balance between simplicity and achieved gains. HINT: we've chosen the DB because it did not require to add any component, messaging communication queues, zookeper and many other thing that we could choose if there that would awfully complicate Airflow deployments. So, by the sheer look of it, your proposal goes into opposite direction comparing to what we decided for Scheduler. That does not mean it is wrong, but it's a hint, that likely it's not preferred one. But if you perform analysis of performance/gains/risks/complexity of this and compare with alternative solutions that you considered, discuss it at the devlist, turn it into Airflow Improvement Proposal, pass it throuhg voting and implement, then of course, there is a possibility this might be something that the community will be willing to accept. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
