nathadfield commented on PR #61448:
URL: https://github.com/apache/airflow/pull/61448#issuecomment-4004342067

   > I still don't see the value of this. Running a dagrun on the latest or old 
version(the version the dagrun was created with initially) should be a dagrun 
clearing behaviour. Are you saying that if a Dag is created(when parsed) and it 
has bundle version 1, then after say 3 runs, and changing the bundle version 3 
times, that when we set "run_on_latest_version=False", then every time a new 
dagrun is created, the first bundle version would be used? Is there a business 
case for this?
   
   No, it doesn't pin to the first-ever version. `DagModel.bundle_version` is 
updated every time the DAG is re-parsed (see 
`dag_processing/collection.py:608`), so it always reflects the version the DAG 
was most recently parsed from.
   
   The distinction is between two values:
   - **DagModel.bundle_version**: the version the DAG was last parsed from
   - **DagBundleModel.version**: the latest version the bundle knows about 
(e.g. the latest git commit)
   
   These can differ temporarily. For example, a Git bundle detects a new commit 
and updates `DagBundleModel.version`, but the scheduler hasn't re-parsed the 
DAG files from that new commit yet. With `run_on_latest_version=False`, the 
dagrun is created with the parsed version (which the scheduler has validated). 
With True, it uses the latest bundle version even if parsing hasn't caught up.
   
   But the primary motivation here is giving people control over the default 
behavior when rerunning DAGs and tasks. When a user clears a dagrun that was 
created with a specific bundle version, the UI shows a checkbox asking whether 
to rerun with the latest version or the version the original run used. The 
`run_on_latest_version` config controls the default state of that checkbox at 
the global and DAG level, so teams don't have to make that decision manually 
every time.
   
   The business case is reproducibility. If you're debugging a failed run and 
clear it, False means the rerun uses the same code as the original run. True 
means it picks up whatever the latest version is, which could include unrelated 
changes that make it harder to isolate the issue. Different teams will have 
different preferences here, and this gives them a way to set that default 
without thinking about it on every clear/rerun.
   
   I agree the docs need to be clearer about this. I'll rewrite the section to 
explain the distinction between "parsed version" and "latest version" properly 
and make the rerun use case more prominent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to