Hi Bolke,

This looks great.

We have had the requirement to run DAGs in different local time zones for a 
while, so far we worked around the limitation on dag-level to automate most of 
our DST switches.

How would the approach behave in the DST-Switch corner cases?

For the regular case, I understand that if start_date=datetime(2017, 1, 1, 8, 
30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”, the DST 
switch would work as expected, and the dag would get scheduled at 7:30 am UTC 
in European Winter and 6:30 UTC in European Summer.

However, if start_date=datetime(2017, 1, 1, 2, 30, 0, 
tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip a 
nightly run in March and have two nightly runs in October?
This seems like the correct thing to do from a time zone logic point of view, 
although I can imagine that there are many operational use cases where the user 
wants something different.

If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and 
the schedule is timedelta(days=14), would a DST switch actually occur?
There is some ambiguity in this case, depending on the timedelta(days=14) being 
understood as either “14 days in local calendar” or 14*24*60*60 seconds on the 
system clock.
I’m not sure what the expected behaviour should be in this case.

Cheers,
Till


On 13.11.17, 19:47, "Ash Berlin-Taylor" <[email protected]> wrote:

    This sounds like an awesome change!
    
    I'm happy to review (will take a look tomorrow) but won't be a suitable 
tester as all our DAGs operate in UTC.
    
    -ash
    
    
    > On 13 Nov 2017, at 18:09, Bolke de Bruin <[email protected]> wrote:
    > 
    > Hi All,
    > 
    > I just want to make you aware that I am creating patches that make 
Airflow timezone aware. The gist of the idea is that Airflow internally will 
use and store UTC everywhere. This allows you to have start_date = 
datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly take 
care of day light savings time. If you are using cron we will make sure to 
always run at the exact time (end of interval of course) which you specify even 
when DST is in effect, e.g. 8.00am is always 8.00am regardless of if a day 
lights savings time has happened. DAGs that don’t have a timezone associated, 
get a default timezone that is configurable.
    > 
    > In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there. 
As the patches are invasive particularly in tests (everything needs a timezone 
basically) less so in other areas I like to raise special attention to a couple 
of places where this has impact.
    > 
    > 1. All database DateTime fields are converted to timezone aware Timestamp 
fields. This impacts MySQL deployments particularly as MySQL was storing 
DateTime fields, which cannot be made timezone aware. Also, to make sure 
conversion happens properly we set the connection time zone to UTC. This is 
supported by Postgres and MySQL. However, it is not supported by SQLServer. So 
if you are running outside of UTC you need to take special care when upgrading.
    > 
    > 2. Thou shall not use datetime.now() and datetime.utcnow() when writing 
code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can 
still use it). Both create naive date times (yes even utcnow() ). You can use 
airflow.utils.timezone utcnow() for this. As you will not be able to store 
naive datetime fields anymore you will notice soon enough.
    > 
    > Finally, and that is the main reason fir this email, I am looking for 
feedback and testers. The PR can be found here: 
https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the tests 
yet, but you can see that I am working hard on that ;-).
    > 
    > Cheers
    > Bolke
    
    

Reply via email to