nathadfield commented on PR #61448: URL: https://github.com/apache/airflow/pull/61448#issuecomment-4004342067
> I still don't see the value of this. Running a dagrun on the latest or old version(the version the dagrun was created with initially) should be a dagrun clearing behaviour. Are you saying that if a Dag is created(when parsed) and it has bundle version 1, then after say 3 runs, and changing the bundle version 3 times, that when we set "run_on_latest_version=False", then every time a new dagrun is created, the first bundle version would be used? Is there a business case for this? No, it doesn't pin to the first-ever version. `DagModel.bundle_version` is updated every time the DAG is re-parsed (see `dag_processing/collection.py:608`), so it always reflects the version the DAG was most recently parsed from. The distinction is between two values: - **DagModel.bundle_version**: the version the DAG was last parsed from - **DagBundleModel.version**: the latest version the bundle knows about (e.g. the latest git commit) These can differ temporarily. For example, a Git bundle detects a new commit and updates `DagBundleModel.version`, but the scheduler hasn't re-parsed the DAG files from that new commit yet. With `run_on_latest_version=False`, the dagrun is created with the parsed version (which the scheduler has validated). With True, it uses the latest bundle version even if parsing hasn't caught up. But the primary motivation here is giving people control over the default behavior when rerunning DAGs and tasks. When a user clears a dagrun that was created with a specific bundle version, the UI shows a checkbox asking whether to rerun with the latest version or the version the original run used. The `run_on_latest_version` config controls the default state of that checkbox at the global and DAG level, so teams don't have to make that decision manually every time. The business case is reproducibility. If you're debugging a failed run and clear it, False means the rerun uses the same code as the original run. True means it picks up whatever the latest version is, which could include unrelated changes that make it harder to isolate the issue. Different teams will have different preferences here, and this gives them a way to set that default without thinking about it on every clear/rerun. I agree the docs need to be clearer about this. I'll rewrite the section to explain the distinction between "parsed version" and "latest version" properly and make the rerun use case more prominent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
