o-nikolas commented on PR #43737: URL: https://github.com/apache/airflow/pull/43737#issuecomment-2469290901
> I want to hear other's opinion on that @ashb @o-nikolas ? and mark it as request changes for now until we agree if this is fine or not to keep it as "executor-specific" config. Here are my thoughts: ## 1) _Can_ we use executor_config in this way? The `executor_config` field is there to pass any key/value configuration to the executor for a specific task. The k8s executor uses it as already discussed, the ECS executor treats it as AWS parameters which are passed along to the boto3 call to ecs `run_task` (so the user can configure more memory for a particular task, or send it to a different ECS cluster, or simply add tags, etc). So it is free to be used in any way an executor sees fit and it's a nice mechanism to customize execution for a task. ## 2) _Should_ we use `executor_config` in this way for capacity management I think this idea of capacity management is interesting and agree we have to make a call on whether or not we should adopt this as a firstclass feature or not within the Airflow executor interface. I agree with @potiuk that there is no harm in updating the executor interface to support it. We have a major release coming up and that's what they're for, so if we want to update the `exec_async` method now to include more parameters (or even pass along the entire TI as @jscheffl suggested) that's perfectly doable. But whether we **should** is another story. Here I think it's less obvious. Many executors don't even have a notion of long lived "workers" (they are completely ephemeral and map 1:1 with a single task for any containerized executor like k8s and AWS ECS). But they don't have to make use of it, or you could imagine it as a way to map to a larger container image configuration for executors like this (i.e. a two slot task would get a larger container with more memory or what have you, pre-defined by the user). One last thing that I do feel strongly about (and I think this also agrees with what @potiuk is saying) is that Airflow already has slots and executors also _already_ have a notion of their own slots (kind of like their internal capacity), which they use to manage how many tasks they can run concurrently/in parallel. I think we should build on this existing mechanism (i.e. a larger task should consume >1 executor slots for the executor that receives it) and then within the executor, knowing that a task is consuming a particular number of slots, the executor can handle that task especially/differently (or do nothing at all if the executor doesn't care about task size). This pushes this to be a Task Instance property which I think makes the most sense, because it is the task that is taking up more capacity and the executor just needs to account for that and react appropriately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
