potiuk commented on PR #25101: URL: https://github.com/apache/airflow/pull/25101#issuecomment-1187632517
> My concern with tying it to number of schedulers is that then the limit on concurrent tasks is implicit (num_schedulers * parallelism) instead of being directly visible. It also means that deciding you don't need HA (for example) means you have to change the configuration even if one scheduler actually could handle the full load of tasks. Could you please explain @collinmcnulty what exactly do you propose to do ? Let me explain how I see it at least and what is my proposal we can do with it - but if you have a concret proposal I am all ears :) > It seems like surprising behavior, which I would want to avoid. Yep. I agree. That's why I think this is really the matter of naming, moving the configuration to the proper section - not changing behaviour - to be honest, nothing else. Precisely to avoid confusion. There are two ways you can modify the task limits as I see (and they are good, maybe neeed a bit reshuffling of parameters and better explanation). * https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#default-pool-task-slot-count - limits total numbert of "pool slots" you can have (that's the initial value, you actually modify it later by increasing default pool size in the DB not in the config. This is the "slots" not tasks. Again - each task can take more than 1 slot. This makes sense because they are resource bound for task execution. * Num_schedulers * parallelsm (scheduler_paralllelism ? executor_parallelism? would be a better name likely) - is the total number of tasks schedulers can control (and this one controls the "Executor" side, not "worker resources". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
