[GitHub] [airflow] potiuk commented on issue #28372: Airflow KubernetesPodOperator task running despite no resources being available

GitBox Fri, 16 Dec 2022 17:17:43 -0800


potiuk commented on issue #28372:
URL: https://github.com/apache/airflow/issues/28372#issuecomment-1355901746


   > > You can limit the number of concurrent task run by airflow thanks to 
airflow pool
   > > or concurrency settings on the dag
   > 
   > Ya perhaps this is not necessarily a bug from airflow then but more of a 
feature request. What you suggested is a workaround to my problem, thanks for 
the suggestion.
   
   > I think it would be pretty cool if Airflow scheduler was aware of 
resources + auto scaling capability of a cluster, and then schedule accordingly 
(i.e. keep running jobs, and schedule the remainder that no resources can 
possibly be allocated for).
   
   This is actually not a workaround. This is how you are supposed to limit 
resources in Airflow when you use Kubernetes Pod Operator. 
   
   Using Kubernetes Pod Operator and expecting Airlfow to understand resource 
limits coming from autoscaling of the cluster it runs would basically mean that 
Airflow would have to copy the whole logic of Kubernetes to know what it can / 
cannot schedule. I am not sure if you are aware that there are plenty of things 
Kubernetes takes into account when scheduling pods - and many of them have 
super complex logic. It's not only memory, but also affinities, 
anti-affinities, labels that are matching or not the nodes the pod could run on 
and plenty of others. For example imagine you have 20 KPOs each requiring GPU 
and only 2 GPUS are available. And tihs is only one of the cases. Duplicating 
the whole logic of K8S by airflow is not only difficult but also prone to 
errors and it would mean that Airlfow's KPO would be closely tied with specific 
version of K8S because new features of K8S are added with each release. What 
you ask for is not really feasible. 
   
   You might think it is simple for your specific case because you just 
**know** you have 2 CPUS per node and you know you have 6 of them in total, so 
it must be simple for Airflow to know it ...  But in fact Airlfow would have to 
implement a very complex logic to know it in general case. And by providing the 
Pool you ACTUALLY pass your knowledge to Airflow and it indeed knows what are 
the limits without performing all the complex and brittle K8s logic..
   
   We do not really want to re-implement K8S in Airflow.
   
   But you can do better than manually allocating fixed pool of resources for 
your workloads. And Airlflow gets you covered.
   
   If you really want to do scaling, then what you can do you can use Celery 
Executor Running on K8S. As surprisingly as it is - this is pretty good way to 
implement K8s auto-scaling. This is precisely what Celery Executor was designed 
for really - especially if you have relatively short tasks which are similar to 
each other in terms of complexity, CeleryExecutor is the way to go rather than 
running tasks through KPOs. We have KEDA-based auto-scaling implemented in our 
Helm Chart, and  if you run it on top of auto-scaling K8S cluster, it will 
actually be able to handle autoscaling well. You can even connect it with long 
running Kubernetes tasks and run Celery Kubernetes Executor and choose which 
tasks are run where.
   
   Again - in this case you need to manage queues to direct your load, but then 
those queues can dynamically grow in sizes if you want it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #28372: Airflow KubernetesPodOperator task running despite no resources being available

Reply via email to