jscheffl commented on PR #43737: URL: https://github.com/apache/airflow/pull/43737#issuecomment-2463373135
> > Should not that capacity be a task parameter rather than executor config parameter on DAG level. We have similar concept with `pool_slots` and there they are "per task" - and part of the BaseOperator. It seems to be way more flexible to specify it this way (additionally then this could be renamed as "task_slots" - to be similar to "pool_slots") or maybe even we should combine the two. This way it will also be potentially usable by other executors. > > I was thinking in the same way, but during coding I saw that the need_capacity parameter is no easy to get into the executor. We have to tough core code like TaskInstanceKey class to get the info into the Executor. I just used the idea of the KubernetesExecutor to add additional data into the executor and that is the reason why I started using the executor_config parameter. My main idea is to tough only Edge package t and then make a later PR which can add this changes into the core because the Edge package is the only which will support this feature for the moment and it is not released yet. So what is your opinion about that? Shall we change this also in this PR or in a separate PR? During writing this lines I have also the feeling to use still the term concurrency instead of capacity. Then it is easier to adapt this to already existing Executor code in the future. @potiuk I had also a longer talk to @AutomationDev85 today about this. Reading the docs from https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/kubernetes_executor.html#pod-override this field is used (and is only used there today) to carry a dict with a potential included of `pod_override` element that can define extra details of the POD to run for the task execution. That can be used to add volume mounts, request resources or add sidecars... whatsoever. With this interface generically more parameters can be carried. An additional field in the dict would not harm. I am thinking that maybe instead of "needs capacity" we should name it `pool_slots` according to the task instance parameter. With this PR here, you would need to define this as extra field on the task instance... but with a small additional PR we could bring the `pool_slots` from the task instance per default in there for future leverage... but then this intrinsic is a bit confusing though. As @AutomationDev85 said we could also bring the `pool_slots` field directly from the task instance into the executor, but today the interface in the scheduler in airflow/executors/base_executor.py:execute_async() only carries TaskInstanceKey, Command, queue and the executor_config - adding the full TaskInstance or the pool_slots here would be a breaking change in the interface or the executor would need to query the DB additionally to get the pool_slots (which the scheduler obviously already has because it allocated the pool slots before scheduling... the calling method `_process_tasks()` has the taskinstance object). @potiuk Do you think we need/should to make a breaking change in the scheduler/executor interface or add an intrinsic? @ashb as being the Scheduler expert, would you have an opinion on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
