dstandish commented on PR #35887:
URL: https://github.com/apache/airflow/pull/35887#issuecomment-1843549691

   Well, let's assess and articulate what the actual behavior change is.  We'll 
need that anyway, however we categorize it.
   
   I took a look at your code and refactored a bit to better understand what 
the actual change is.
   
   Here's my version
   
   ```python
   from __future__ import annotations
   
   from datetime import datetime
   
   import pendulum
   
   from airflow.timetables.interval import CronDataIntervalTimetable
   
   # Zurich (Switzerland) is chosen since it is +1/+2 DST, making it a bit 
easier
   # to get my head around the mental timezone conversion.
   # In 2023, DST entered on 26th Mar, 2am local clocks (1am UTC) were turned
   # forward to 3am. DST exited on 29th Oct, 3am local clocks (1am UTC) were
   # turned backward to 2am (making the 2:XX hour fold).
   
   CET = "Europe/Zurich"
   
   
   def to_local_iso(val: str):
       return pendulum.parse(val).in_timezone(CET).isoformat()
   
   
   def run_check(starting_run, timetable, exp_utc_iso):
       print(f"when last run is\t\t\t{starting_run.in_timezone(CET)} / 
{starting_run}...")
       exp_local = to_local_iso(exp_utc_iso)
       print(f"then next run should be\t\t{exp_local} / {exp_utc_iso}")
       next_run = timetable._get_next(starting_run)
       if (dt := next_run.isoformat()) and dt == exp_utc_iso:
           print("and it is!")
       else:
           print(f"but instead it is\t\t\t{to_local_iso(dt)} / {dt}")
           delta = (pendulum.parse(dt) - 
pendulum.parse(exp_utc_iso)).in_minutes()
           print(f"this is {abs(delta)} minutes {'ahead of' if delta < 0 else 
'behind'} schedule")
       print()
   
   
   timetable_irregular = CronDataIntervalTimetable("0 0,3 * * *", 
timezone=pendulum.timezone(CET))
   starting_run1 = pendulum.datetime(2023, 3, 26, 0, tz=CET).in_timezone("UTC")
   run_check(
       starting_run=starting_run1,
       timetable=timetable_irregular,
       exp_utc_iso="2023-03-26T01:00:00+00:00",
   )
   starting_run2 = pendulum.datetime(2023, 10, 29, 0, tz=CET).in_timezone("UTC")
   expected_next_run = (
       pendulum.instance(datetime(2023, 10, 29, 3, fold=1, 
tzinfo=pendulum.timezone(CET)))
       .in_timezone("UTC")
       .isoformat()
   )
   assert expected_next_run == "2023-10-29T02:00:00+00:00"
   run_check(
       starting_run=starting_run2,
       timetable=timetable_irregular,
       exp_utc_iso=expected_next_run,  # runs on the second occurrence of 3am
   )
   ```
   
   and i think this is what the change boils down to...
   
   before the change this is the output:
   
   ```
   when last run is                     2023-03-26T00:00:00+01:00 / 
2023-03-25T23:00:00+00:00...
   then next run should be              2023-03-26T03:00:00+02:00 / 
2023-03-26T01:00:00+00:00
   but instead it is                    2023-03-26T04:00:00+02:00 / 
2023-03-26T02:00:00+00:00
   this is 60 minutes behind schedule
   I
   when last run is                     2023-10-29T00:00:00+02:00 / 
2023-10-28T22:00:00+00:00...
   then next run should be              2023-10-29T03:00:00+01:00 / 
2023-10-29T02:00:00+00:00
   but instead it is                    2023-10-29T02:00:00+01:00 / 
2023-10-29T01:00:00+00:00
   this is 60 minutes ahead of schedule
   ```
   
   Caveat, this is all very confusing and hard to wrap one's head around but 
that said, let's discusse the two cases.
   
   In case 1, spring forward, the next run is supposed to at 3am (2 hours after 
the last run, since an hour is skipped) but instead the next run is calculated 
as 4am, which is simply the wrong time by any assessment.  So one behavior 
change is, instead of running at 4am, it will run at 3am, the correct clock 
time.  This to me seems like a bugfix.
   
   Side note: we might also want to verify the behavior for a `0,2` schedule, 
since i think in spring ahead, there is never a 2am.
   
   In case 2, fall back, the next run is supposed to be the 3am.  In actuality, 
this is an unambiguous time reference.  At fall back, the clock only strikes 
3am one time.  I.e. after 2:59 am, it goes to 2:00 again -- not first to 3am 
and then back to 2am.  Which is to say, the next run is at the wrong clock time 
again -- 2am instead of 3am.  So again, to me this looks like a bugfix.
   
   The other thing to note is that we also observed this year that with fall 
back, the dag would actually get stuck (if catchup=True, then forever, or until 
next run is past transition otherwise).  So, presuming this fixes that, it's a 
bugfix on another layer.
   
   Note, I'm not arguing to "win", just trying to further our own 
understandings of what this is and how best to represent this to users.  Do I 
understand this correctly?  Do you think I'm describing the details correctly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to