Hi, I see there's a discussion with the same idea here already https://github.com/apache/airflow/issues/13975 But there's no AIP yet, I'm considering making a proposal so that people can seriously consider this make such an improvement.
The idea is that one task may depend on multiple resources, and we may want to make sure all these different types of resources are available before the task can run. Each type of resource is limited and we may have a pool for this resource. Currently airflow only uses one pool for one task to decide if this task can run, then it won't cover the multi-resource case. For example, a task may depend on both resource A and B, where A can be the memory, and B can be the number of connections. We set up Pool_A for resource A and Pool_B for resource B. Pool_A can have 16 slots (e.g. 16GB memory), while Pool_B can have 10 slots (10 connections at most). We may have different types of tasks that use different amounts of memory(A) and connection(B) resources. For instance, a type of task T1 needs 2 slots from Pool_A, and 1 slot from Pool_B, while another type of task T2 needs 1 slot from Pool_A, and 2 slots from Pool_B. When we run a few T1 and T2 task instances together, we'll have a problem in finding out how many T1 and T2 tasks can run together within the resource limit. If we assume T1 only uses Pool_A, and T2 only uses Pool_B, then airflow can schedule 8 T1 tasks and 5 T2 tasks to run together, but then it can actually consume 8*2+5 = 21 amounts of resource A at run time. In this case, A is memory, and if the slot unit is GB in memory, it means now airflow allows the running tasks to occupy 21GB memory while there's only 16GB available. Then the tasks will fail at run time. We need to let airflow be aware that T1 and T2 need both resource A and B, and both resources' available slots have to be enough before one task can run. As suggested in the above github thread, one proposal is to allow the pool and pool slots to take list arguments, and the airflow scheduler needs to check that all the passed-in pools in the list have enough slots before it allows a task to run. Please let me know what you think or if there're any suggestions. Thank you very much! -- Best, Kevin