manipatnam commented on PR #64162:
URL: https://github.com/apache/airflow/pull/64162#issuecomment-4124253984

   > The changes around run_immediately=False and removing the implicit buffer 
feel more like a semantic change than a bug fix. The buffer logic looks quite 
intentional and provides a small grace window for slightly late scheduling, and 
now False strictly skips the past run. Also changing the default from False to 
True is prettty big shift. Now, I will caveat this by stating that I may be 
missing context, so I am happy to be corrected but a semantic shift like this 
in scheduler-adjacent code needs stronger justification than what you have 
provided.
   
   Thanks for the review @SameerMesiah97 
   
   **Why the auto-buffer was removed from `False`**
   
   In the old code, `False` and `None` had identical implementations — both 
applied the same
   10% grace window. So passing `False` did not guarantee skipping the past 
cron point; it
   could still run it if the scheduler evaluated within the buffer window. 
Since `timedelta`
   already exists for users who want a grace window, `False` is cleaned up to 
strictly mean
   what it says: always skip the past cron point.
   
   **Why the default changed from `False` to `True`**
   
   Because of the bug, `run_immediately=False` was silently ignored whenever 
`start_date` was
   set. Since almost every production DAG sets `start_date`, the default 
`False` always behaved
   like `True` for the vast majority of users in practice. Changing the default 
to `True` preserves
   what users have always observed. If we kept the default as `False` and only 
fixed the bug,
   every existing DAG without an explicit `run_immediately` would suddenly stop 
running the past
   cron point on first enable — a much larger breaking change.
   
   > I think this PR is mixing a legitimate bug fix i.e. ensuring that setting 
the start_date does not subvert run_immediately in the timetable logic, with a 
behavioral change that feels a bit opinionated. Always routing the first run 
through _calc_first_run() makes sense, since previously run_immediately could 
effectively be ignored when start_date was set which is clearly inconsistent.
   
   This is my actual pain point. I have updated the description as stated
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to