Yeah I figured it out. Originally i was using a different implementation of UTCDateTime, but that was unmaintained. I switched, but this version changed or has a different contract. While it transforms on storing to UTC it does not so when it receives timezone aware fields from the db. Hence the issue.
I will prepare a PR that removes the dependency and implements our own extension of DateTime. Probably tomorrow. Good catch! Just in time :-(. B. > On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <ash_airflowl...@firemirror.com> > wrote: > > Entirely possible, though I wasn't even dealing with the scheduler - the > issue I was addressing was entirely in the webserver for a pre-existing Task > Instance. > > Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that > isn't working right/ as expected. This line: > https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34 > doens't look right for us - as you mentioned the TZ is set to something > (rather than having no TZ value). > > Some background on how Pq handles TZs. It always returns DTs in the TZ of the > connection. I'm not sure if this is unique to postgres or if other DBs behave > the same. > > postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone; > timestamptz > ------------------------ > 2018-08-03 01:00:00+01 > > postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; > timestamptz > ------------------------ > 2018-08-03 01:00:00+01 > > The server will always return TZs in the connection timezone. > > postgres=# set timezone=utc; > SET > postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; > timestamptz > ------------------------ > 2018-08-03 00:00:00+00 > (1 row) > > postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone; > timestamptz > ------------------------ > 2018-08-03 00:00:00+00 > (1 row) > > > > > -ash > >> On 5 Aug 2018, at 21:28, Bolke de Bruin <bdbr...@gmail.com> wrote: >> >> This is the issue: >> >> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 >> 00:00:00+00:00 tzinfo: <Timezone [UTC]> >> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun >> example_http_operator @ 2018-08-03 02:00:00+02:00: >> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False> >> >> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 >> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None) >> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun >> example_http_operator @ 2018-08-04 02:00:00+02:00: >> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False> >> >> Notice at line 1+2: that the next run date is correctly in UTC but from the >> DB it gets a +2. At the next bit (3+4) we get a >> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the >> specs of https://github.com/spoqa/sqlalchemy-utc >> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. >> >> So changing your setting of the DB to UTC fixes the symptom but not the >> cause. >> >> B. >> >>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor <ash_airflowl...@firemirror.com> >>> wrote: >>> >>> Sorry for being terse before. >>> >>> So the issue is that the ts loaded from the DB is not in UTC, it's in >>> GB/+01 (the default of the DB server) >>> >>> For me, on a currently running 1.9 (no TZ) db: >>> >>> airflow=# select * from task_instance; >>> get_op | example_http_operator | 2018-07-23 00:00:00 >>> >>> This date time appears in the log url, and the path it looks at on S3 is >>> >>> .../example_http_operator/2018-07-23T00:00:00/1.log >>> >>> If my postgres server has a default timezone of GB (which the one running >>> on my laptop does), and I then apply the migration then it is converted to >>> that local time. >>> >>> airflow=# select * from task_instance; >>> get_op | example_http_operator | 2018-07-23 01:00:00+01 >>> >>> airflow=# set timezone=UTC; >>> airflow=# select * from task_instance; >>> get_op | example_http_operator | 2018-07-23 00:00:00+00 >>> >>> >>> This is all okay so far. The migration has kept the column at the same >>> moment in time. >>> >>> The issue come when the UI tries to display logs for this old task: because >>> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. >>> Thus after the migration this old task tries to look for a log file of >>> >>> .../example_http_operator/2018-07-23T01:00:00/1.log >>> >>> which doesn't exist - it's changed the time it has rendered from midnight >>> (in v1.9) to 1am (in v1.10). >>> >>> (This is with my change to log_filename_template from UPDATING.md in my >>> other branch) >>> >>> Setting the timezone to UTC per connection means the behaviour of Airflow >>> doesn't change depending on how the server is configured. >>> >>> -ash >>> >>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <bdbr...@gmail.com> wrote: >>>> >>>> Digging in a bit further. >>>> >>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number >>>> }}}}.log >>>> >>>> is the format >>>> >>>> ts = execution_date.isoformat and should be in UTC afaik. >>>> >>>> something is weird tbh. >>>> >>>> B. >>>> >>>> >>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <bdbr...@gmail.com> wrote: >>>>> >>>>> Ash, >>>>> >>>>> Reading your proposed changes on your “set-timezone-to-utc” branch and >>>>> below analysis, I am not sure what you are perceiving as an issue. >>>>> >>>>> For conversion we assume everything is stored in UTC and in a naive >>>>> format. Conversion then adds the timezone information. This results in >>>>> the following >>>>> >>>>> postgres timezone = “Europe/Amsterdam” >>>>> >>>>> >>>>> airflow=# select * from task_instance; >>>>> get_op | example_http_operator | 2018-07-27 02:00:00+02 >>>>> >>>>> airflow=# set timezone=UTC; >>>>> airflow=# select * from task_instance; >>>>> get_op | example_http_operator | 2018-07-27 00:00:00+00 >>>>> >>>>> If we don’t set the timezone in the connection postgres assumes server >>>>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow >>>>> receives will be in “Europe/Amsterdam” format. However as we defined the >>>>> model to use UTCDateTime it will always convert the returned DateTime to >>>>> UTC. >>>>> >>>>> If we have configured Airflow to support something else as UTC as the >>>>> default timezone or a DAG has a associated timezone we only convert to >>>>> that timezone when calculating the next runtime (not for cron btw). >>>>> Nowhere else and thus we are UTC everywhere. >>>>> >>>>> What do you think is inconsistent? >>>>> >>>>> Bolke >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor >>>>>> <ash_airflowl...@firemirror.com> wrote: >>>>>> >>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless to >>>>>> timezone aware colums in the task instance is right, or at least it's >>>>>> not what I expected. >>>>>> >>>>>> Before weren't all TZs from schedule dates etc in UTC? For the same task >>>>>> instance (these outputs from psql directly): >>>>>> >>>>>> before: execution_date=2017-09-04 00:00:00 >>>>>> after: execution_date=2017-09-04 01:00:00+01 >>>>>> >>>>>> **Okay the migration is fine**. It appears that the migration has done >>>>>> the right thing, but my local DB I'm testing with has a Timezone of GB >>>>>> set, so Postgres converts it to that TZ on returning an object. >>>>>> >>>>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to >>>>>> have consistent behaviour? Is this possible some how? I don't know >>>>>> SQLAlchemy that well. >>>>>> >>>>>> >>>>>> -ash >>>>>> >>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor >>>>>>> <ash_airflowl...@firemirror.com> wrote: >>>>>>> >>>>>>> 1.) Missing UPDATING note about change of task_log_reader to now always >>>>>>> being "task" (was "s3.task" before.). Logging config is much simpler >>>>>>> now though. This may be particular to my logging config, but given how >>>>>>> much of a pain it was to set up S3 logging in 1.9 I have shared my >>>>>>> config with some people in the Gitter chat so It's not just me. >>>>>>> >>>>>>> 2) The path that log-files are written to in S3 has changed (again - >>>>>>> this happened from 1.8 to 1.9). I'd like to avoid having to move all of >>>>>>> my log files again to continue viewing them. The change is that the >>>>>>> path now (in 1.10) has a timezone in it, and the date is in local time, >>>>>>> before it was UTC: >>>>>>> >>>>>>> before: 2018-07-23T00:00:00/1.log >>>>>>> after: 2018-07-23T01:00:00+01:00/1.log >>>>>>> >>>>>>> We can possibly get away with an updating note about this to set a >>>>>>> custom log_filename_template. Testing this now. >>>>>>> >>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <a...@firemirror.com> wrote: >>>>>>>> >>>>>>>> -1(binding) from me. >>>>>>>> >>>>>>>> Installed with: >>>>>>>> >>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install >>>>>>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr >>>>>>>> >>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>, >>>>>>>> s3, crypto]>=1.10' >>>>>>>> >>>>>>>> Install went fine. >>>>>>>> >>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there is now >>>>>>>> a hard dependency on the Kubernetes client libs, but the `emr` group >>>>>>>> doesn't mention this. >>>>>>>> >>>>>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 >>>>>>>> <https://github.com/apache/incubator-airflow/pull/3112> >>>>>>>> >>>>>>>> I see two options for this - either conditionally enable k8s:// >>>>>>>> support if the import works, or (less preferred) add kube-client to >>>>>>>> the emr deps (which I like less) >>>>>>>> >>>>>>>> Sorry - this is the first time I've been able to test it. >>>>>>>> >>>>>>>> I will install this dep manually and continue testing. >>>>>>>> >>>>>>>> -ash >>>>>>>> >>>>>>>> (Normally no time at home due to new baby, but I got a standing desk, >>>>>>>> and a carrier meaning she can sleep on me and I can use my laptop. >>>>>>>> Win!) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <bdbr...@gmail.com >>>>>>>>> <mailto:bdbr...@gmail.com>> wrote: >>>>>>>>> >>>>>>>>> Bump. >>>>>>>>> >>>>>>>>> Committers please cast your vote. >>>>>>>>> >>>>>>>>> B. >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko <fo...@driesprong.frl >>>>>>>>>> <mailto:fo...@driesprong.frl>> wrote: >>>>>>>>>> >>>>>>>>>> +1 Binding >>>>>>>>>> >>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install >>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz >>>>>>>>>> >>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz> >>>>>>>>>> >>>>>>>>>> Cheers, Fokko >>>>>>>>>> >>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <bdbr...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> Hey all, >>>>>>>>>>> >>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the >>>>>>>>>>> release, >>>>>>>>>>> which will last for 72 hours. Consider this my (binding) +1. >>>>>>>>>>> >>>>>>>>>>> Airflow 1.10.0 RC 3 is available at: >>>>>>>>>>> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ >>>>>>>>>>> < >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/> >>>>>>>>>>> >>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source >>>>>>>>>>> release that >>>>>>>>>>> comes with INSTALL instructions. >>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python >>>>>>>>>>> "sdist" >>>>>>>>>>> release. >>>>>>>>>>> >>>>>>>>>>> Public keys are available at: >>>>>>>>>>> >>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ < >>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/> >>>>>>>>>>> >>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at the >>>>>>>>>>> changelog. >>>>>>>>>>> Since RC2 the following has been fixed: >>>>>>>>>>> >>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency >>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords >>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp >>>>>>>>>>> >>>>>>>>>>> Please note that the version number excludes the `rcX` string as >>>>>>>>>>> well >>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This will >>>>>>>>>>> allow us >>>>>>>>>>> to rename the artifact without modifying the artifact checksums >>>>>>>>>>> when we >>>>>>>>>>> actually release. >>>>>>>>>>> >>>>>>>>>>> WARNING: Due to licensing requirements you will need to set >>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when >>>>>>>>>>> installing or upgrading. We will try to remove this requirement for >>>>>>>>>>> the >>>>>>>>>>> next release. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Bolke >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >