[GitHub] [airflow] potiuk commented on pull request #25101: Incorrect definition of parallelism in default_airflow.cfg

GitBox Sun, 17 Jul 2022 10:58:28 -0700


potiuk commented on PR #25101:
URL: https://github.com/apache/airflow/pull/25101#issuecomment-1186580863


   Just to answer the question - Well, Why not actually? I am not sure if I am 
right or wrong but I could argue this way:
   
   You have Queues (and default queue size) that already defines the "resource" 
usage). What's even more you can mark some more heavy tasks with taking more 
slots in the queue (so for example if your task uses 4 CPUs it can take 4 slots 
in the queue). Queues are really the way to define the "resource" binding of 
each task (also because you can have different queues and each queue can be 
bound to different resurces or even different executor (CeleryKubernetes).,
   
   Parallelism is different. It tells scheduler to stop scheduling (thus 
managing) new tasks if more than X tasks are already running and are controlled 
by the executor that is used by the scheduler. So what it accounrs for is extra 
effort needed by scheduler to manage and control more runing tasks, not how 
many resources they take. And it makes sense to make it per-scheduler as it is 
"scheduler resource" bound rather than "worker resource" bound. 
   
   Maybe the names could be different, but I believe that was the original 
intention of why it was implemented like that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on pull request #25101: Incorrect definition of parallelism in default_airflow.cfg

Reply via email to