XD-DENG commented on PR #37778: URL: https://github.com/apache/airflow/pull/37778#issuecomment-1995139815
> A simple example on how this can be used (that cannot be covered otherwise) would be useful. Hi @uranusjr , sure thing, let me try to elaborate a bit more and get your opinions/inputs on this. In different scenarios, we realize our TIs may have different extra dependencies. For example, - Scenario 1: The TI may have to use separate hardware resource. It can be external GPU/acceleration hardware, other than CPU/memory resource (this is literally similar to the built-in Dep `PoolSlotsAvailableDep` or `DagTISlotsAvailableDep`). - Scenario 2: the TI is only supposed to start when an event is identified (this event may be checked via an API, but it's not present as a `Dataset` in Airflow, and adding Operator to describe the dependency here is not desirable) - Scenario 3: Among all the TIs (they may not belong to the same DAG), we may want them to be executed in certain customized order. By default, Airflow will execute the TIs in a somehow "FIFO" order, OR take the TI `Priority Weight` into consideration. But we are having a bigger idea in mind: based on the TIs' expected duration + the global concurrency we allow + resource availability, we may want to shuffle the execution order of the TIs, in order to achieve the best global efficiency. The easiest way to achieve these ideas above, as far as my team can see, is to ensure we can add our custom TI Deps into the DagRun's TI scheduling decision making process. I would love to hear how you think of this, or if you have any good alternative solution to share. Many thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
