Re: [PR] [Edge] Edge worker supports capacity handling instead of concurrency [airflow]

via GitHub Mon, 11 Nov 2024 15:37:20 -0800


o-nikolas commented on PR #43737:
URL: https://github.com/apache/airflow/pull/43737#issuecomment-2469290901


   > I want to hear other's opinion on that @ashb @o-nikolas ? and mark it as 
request changes for now until we agree if this is fine or not to keep it as 
"executor-specific" config.
   
   
   Here are my thoughts:
   
   ## 1) _Can_ we use executor_config in this way?
   
   The `executor_config` field is there to pass any key/value configuration to 
the executor for a specific task. The k8s executor uses it as already 
discussed, the ECS executor treats it as AWS parameters which are passed along 
to the boto3 call to ecs `run_task` (so the user can configure more memory for 
a particular task, or send it to a different ECS cluster, or simply add tags, 
etc). So it is free to be used in any way an executor sees fit and it's a nice 
mechanism to customize execution for a task.
   
   ## 2) _Should_ we use `executor_config` in this way for capacity management
   I think this idea of capacity management is interesting and agree we have to 
make a call on whether or not we should adopt this as a firstclass feature or 
not within the Airflow executor interface. I agree with @potiuk that there is 
no harm in updating the executor interface to support it. We have a major 
release coming up and that's what they're for, so if we want to update the 
`exec_async` method now to include more parameters (or even pass along the 
entire TI as @jscheffl suggested) that's perfectly doable.
   But whether we **should** is another story. Here I think it's less obvious. 
Many executors don't even have a notion of long lived "workers" (they are 
completely ephemeral and map 1:1 with a single task for any containerized 
executor like k8s and AWS ECS). But they don't have to make use of it, or you 
could imagine it as a way to map to a larger container image configuration for 
executors like this (i.e. a two slot task would get a larger container with 
more memory or what have you, pre-defined by the user). 
   
   One last thing that I do feel strongly about (and I think this also agrees 
with what @potiuk is saying) is that Airflow already has slots and executors 
also _already_ have a notion of their own slots (kind of like their internal 
capacity), which they use to manage how many tasks they can run concurrently/in 
parallel. I think we should build on this existing mechanism (i.e. a larger 
task should consume >1 executor slots for the executor that receives it) and 
then within the executor, knowing that a task is consuming a particular number 
of slots, the executor can handle that task especially/differently (or do 
nothing at all if the executor doesn't care about task size). This pushes this 
to be a Task Instance property which I think makes the most sense, because it 
is the task that is taking up more capacity and the executor just needs to 
account for that and react appropriately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Edge] Edge worker supports capacity handling instead of concurrency [airflow]

Reply via email to