potiuk commented on issue #28372: URL: https://github.com/apache/airflow/issues/28372#issuecomment-1355901746
> > You can limit the number of concurrent task run by airflow thanks to airflow pool > > or concurrency settings on the dag > > Ya perhaps this is not necessarily a bug from airflow then but more of a feature request. What you suggested is a workaround to my problem, thanks for the suggestion. > I think it would be pretty cool if Airflow scheduler was aware of resources + auto scaling capability of a cluster, and then schedule accordingly (i.e. keep running jobs, and schedule the remainder that no resources can possibly be allocated for). This is actually not a workaround. This is how you are supposed to limit resources in Airflow when you use Kubernetes Pod Operator. Using Kubernetes Pod Operator and expecting Airlfow to understand resource limits coming from autoscaling of the cluster it runs would basically mean that Airflow would have to copy the whole logic of Kubernetes to know what it can / cannot schedule. I am not sure if you are aware that there are plenty of things Kubernetes takes into account when scheduling pods - and many of them have super complex logic. It's not only memory, but also affinities, anti-affinities, labels that are matching or not the nodes the pod could run on and plenty of others. For example imagine you have 20 KPOs each requiring GPU and only 2 GPUS are available. And tihs is only one of the cases. Duplicating the whole logic of K8S by airflow is not only difficult but also prone to errors and it would mean that Airlfow's KPO would be closely tied with specific version of K8S because new features of K8S are added with each release. What you ask for is not really feasible. You might think it is simple for your specific case because you just **know** you have 2 CPUS per node and you know you have 6 of them in total, so it must be simple for Airflow to know it ... But in fact Airlfow would have to implement a very complex logic to know it in general case. And by providing the Pool you ACTUALLY pass your knowledge to Airflow and it indeed knows what are the limits without performing all the complex and brittle K8s logic.. We do not really want to re-implement K8S in Airflow. But you can do better than manually allocating fixed pool of resources for your workloads. And Airlflow gets you covered. If you really want to do scaling, then what you can do you can use Celery Executor Running on K8S. As surprisingly as it is - this is pretty good way to implement K8s auto-scaling. This is precisely what Celery Executor was designed for really - especially if you have relatively short tasks which are similar to each other in terms of complexity, CeleryExecutor is the way to go rather than running tasks through KPOs. We have KEDA-based auto-scaling implemented in our Helm Chart, and if you run it on top of auto-scaling K8S cluster, it will actually be able to handle autoscaling well. You can even connect it with long running Kubernetes tasks and run Celery Kubernetes Executor and choose which tasks are run where. Again - in this case you need to manage queues to direct your load, but then those queues can dynamically grow in sizes if you want it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
