jason810496 commented on issue #45995: URL: https://github.com/apache/airflow/issues/45995#issuecomment-2627098266
Hi @pierrejeambrun, After some POC work and deeper investigation, I found that even when **"Sometimes a dag version can change in the middle of a dag run. "** happens, all `TaskInstance`s of the same `DagRun` will still have the same `dag_version`. This means that the idea **"Retrieve DagRun versions as an aggregate of the DagRun TIs versions (i.e., all different versions used for all TIs are the version of the DagRun)"** is unlikely to happen. Here’s my breakdown: 1. `dag_version` is passed as an argument to `dag.create_dagrun` and then passed down to `_create_orm_dagrun`. 2. The `DagRun` instance is created as `run` in `_create_orm_dagrun`, which then calls `run.verify_integrity`. 3. The `task_creator` factory, obtained from `self._get_task_creator`, creates instances of the `TaskInstance` model (let’s simplify these as `TIs`). Importantly, the **`dag_version_id` of all `TIs` is always taken from `self.dag_version_id`**, meaning all `dag_versions` of `TIs` will be the same as that of the `DagRun`. 4. All `TIs` created by `task_creator` are persisted using `self._create_task_instances`. In summary, I believe we don’t need to remove the **direct link between `DagRun` and `DagVersion`**, and the return type of `dag_version` in this API will always be a single `UUID`. Am I correct in this assumption? Or is there actually a scenario where **different `dag_versions` can be found among `TIs` in the same `DagRun`** that I haven't considered? Thanks! cc @ephraimbuddy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
