jscheffl commented on PR #43737:
URL: https://github.com/apache/airflow/pull/43737#issuecomment-2463373135

   > > Should not that capacity be a task parameter rather than executor config 
parameter on DAG level. We have similar concept with `pool_slots` and there 
they are "per task" - and part of the BaseOperator. It seems to be way more 
flexible to specify it this way (additionally then this could be renamed as 
"task_slots" - to be similar to "pool_slots") or maybe even we should combine 
the two. This way it will also be potentially usable by other executors.
   > 
   > I was thinking in the same way, but during coding I saw that the 
need_capacity parameter is no easy to get into the executor. We have to tough 
core code like TaskInstanceKey class to get the info into the Executor. I just 
used the idea of the KubernetesExecutor to add additional data into the 
executor and that is the reason why I started using the executor_config 
parameter. My main idea is to tough only Edge package t and then make a later 
PR which can add this changes into the core because the Edge package is the 
only which will support this feature for the moment and it is not released yet. 
So what is your opinion about that? Shall we change this also in this PR or in 
a separate PR? During writing this lines I have also the feeling to use still 
the term concurrency instead of capacity. Then it is easier to adapt this to 
already existing Executor code in the future.
   
   @potiuk I had also a longer talk to @AutomationDev85 today about this. 
Reading the docs from 
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/kubernetes_executor.html#pod-override
 this field is used (and is only used there today) to carry a dict with a 
potential included of `pod_override` element that can define extra details of 
the POD to run for the task execution. That can be used to add volume mounts, 
request resources or add sidecars... whatsoever.
   
   With this interface generically more parameters can be carried. An 
additional field in the dict would not harm. I am thinking that maybe instead 
of "needs capacity" we should name it `pool_slots` according to the task 
instance parameter. With this PR here, you would need to define this as extra 
field on the task instance... but with a small additional PR we could bring the 
`pool_slots` from the task instance per default in there for future leverage... 
but then this intrinsic is a bit confusing though.
   
   As @AutomationDev85 said we could also bring the `pool_slots` field directly 
from the task instance into the executor, but today the interface in the 
scheduler in airflow/executors/base_executor.py:execute_async() only carries 
TaskInstanceKey, Command, queue and the executor_config - adding the full 
TaskInstance or the pool_slots here would be a breaking change in the interface 
or the executor would need to query the DB additionally to get the pool_slots 
(which the scheduler obviously already has because it allocated the pool slots 
before scheduling... the calling method `_process_tasks()` has the taskinstance 
object).
   
   @potiuk Do you think we need/should to make a breaking change in the 
scheduler/executor interface or add an intrinsic?
   @ashb as being the Scheduler expert, would you have an opinion on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to