+1 :-) Sent from my iPhone
> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <[email protected]> > wrote: > > Yup, just worked out the same thing. > > I think as "punishment" for me finding bugs so late in two RCs (this, and > 1.9) I should run the release for the next release. > > -ash > >> On 5 Aug 2018, at 22:05, Bolke de Bruin <[email protected]> wrote: >> >> Yeah I figured it out. Originally i was using a different implementation of >> UTCDateTime, but that was unmaintained. I switched, but this version changed >> or has a different contract. While it transforms on storing to UTC it does >> not so when it receives timezone aware fields from the db. Hence the issue. >> >> I will prepare a PR that removes the dependency and implements our own >> extension of DateTime. Probably tomorrow. >> >> Good catch! Just in time :-(. >> >> B. >> >>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <[email protected]> >>> wrote: >>> >>> Entirely possible, though I wasn't even dealing with the scheduler - the >>> issue I was addressing was entirely in the webserver for a pre-existing >>> Task Instance. >>> >>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that >>> isn't working right/ as expected. This line: >>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34 >>> doens't look right for us - as you mentioned the TZ is set to something >>> (rather than having no TZ value). >>> >>> Some background on how Pq handles TZs. It always returns DTs in the TZ of >>> the connection. I'm not sure if this is unique to postgres or if other DBs >>> behave the same. >>> >>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone; >>> timestamptz >>> ------------------------ >>> 2018-08-03 01:00:00+01 >>> >>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; >>> timestamptz >>> ------------------------ >>> 2018-08-03 01:00:00+01 >>> >>> The server will always return TZs in the connection timezone. >>> >>> postgres=# set timezone=utc; >>> SET >>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; >>> timestamptz >>> ------------------------ >>> 2018-08-03 00:00:00+00 >>> (1 row) >>> >>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone; >>> timestamptz >>> ------------------------ >>> 2018-08-03 00:00:00+00 >>> (1 row) >>> >>> >>> >>> >>> -ash >>> >>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <[email protected]> wrote: >>>> >>>> This is the issue: >>>> >>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 >>>> 00:00:00+00:00 tzinfo: <Timezone [UTC]> >>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun >>>> example_http_operator @ 2018-08-03 02:00:00+02:00: >>>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False> >>>> >>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 >>>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, >>>> name=None) >>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun >>>> example_http_operator @ 2018-08-04 02:00:00+02:00: >>>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False> >>>> >>>> Notice at line 1+2: that the next run date is correctly in UTC but from >>>> the DB it gets a +2. At the next bit (3+4) we get a >>>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to >>>> the specs of https://github.com/spoqa/sqlalchemy-utc >>>> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. >>>> >>>> So changing your setting of the DB to UTC fixes the symptom but not the >>>> cause. >>>> >>>> B. >>>> >>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor >>>>> <[email protected]> wrote: >>>>> >>>>> Sorry for being terse before. >>>>> >>>>> So the issue is that the ts loaded from the DB is not in UTC, it's in >>>>> GB/+01 (the default of the DB server) >>>>> >>>>> For me, on a currently running 1.9 (no TZ) db: >>>>> >>>>> airflow=# select * from task_instance; >>>>> get_op | example_http_operator | 2018-07-23 00:00:00 >>>>> >>>>> This date time appears in the log url, and the path it looks at on S3 is >>>>> >>>>> .../example_http_operator/2018-07-23T00:00:00/1.log >>>>> >>>>> If my postgres server has a default timezone of GB (which the one running >>>>> on my laptop does), and I then apply the migration then it is converted >>>>> to that local time. >>>>> >>>>> airflow=# select * from task_instance; >>>>> get_op | example_http_operator | 2018-07-23 01:00:00+01 >>>>> >>>>> airflow=# set timezone=UTC; >>>>> airflow=# select * from task_instance; >>>>> get_op | example_http_operator | 2018-07-23 00:00:00+00 >>>>> >>>>> >>>>> This is all okay so far. The migration has kept the column at the same >>>>> moment in time. >>>>> >>>>> The issue come when the UI tries to display logs for this old task: >>>>> because the timezone of the connection is not UTC, PG returns a date with >>>>> a +01 TZ. Thus after the migration this old task tries to look for a log >>>>> file of >>>>> >>>>> .../example_http_operator/2018-07-23T01:00:00/1.log >>>>> >>>>> which doesn't exist - it's changed the time it has rendered from midnight >>>>> (in v1.9) to 1am (in v1.10). >>>>> >>>>> (This is with my change to log_filename_template from UPDATING.md in my >>>>> other branch) >>>>> >>>>> Setting the timezone to UTC per connection means the behaviour of Airflow >>>>> doesn't change depending on how the server is configured. >>>>> >>>>> -ash >>>>> >>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <[email protected]> wrote: >>>>>> >>>>>> Digging in a bit further. >>>>>> >>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number >>>>>> }}}}.log >>>>>> >>>>>> is the format >>>>>> >>>>>> ts = execution_date.isoformat and should be in UTC afaik. >>>>>> >>>>>> something is weird tbh. >>>>>> >>>>>> B. >>>>>> >>>>>> >>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <[email protected]> wrote: >>>>>>> >>>>>>> Ash, >>>>>>> >>>>>>> Reading your proposed changes on your “set-timezone-to-utc” branch and >>>>>>> below analysis, I am not sure what you are perceiving as an issue. >>>>>>> >>>>>>> For conversion we assume everything is stored in UTC and in a naive >>>>>>> format. Conversion then adds the timezone information. This results in >>>>>>> the following >>>>>>> >>>>>>> postgres timezone = “Europe/Amsterdam” >>>>>>> >>>>>>> >>>>>>> airflow=# select * from task_instance; >>>>>>> get_op | example_http_operator | 2018-07-27 02:00:00+02 >>>>>>> >>>>>>> airflow=# set timezone=UTC; >>>>>>> airflow=# select * from task_instance; >>>>>>> get_op | example_http_operator | 2018-07-27 00:00:00+00 >>>>>>> >>>>>>> If we don’t set the timezone in the connection postgres assumes server >>>>>>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow >>>>>>> receives will be in “Europe/Amsterdam” format. However as we defined >>>>>>> the model to use UTCDateTime it will always convert the returned >>>>>>> DateTime to UTC. >>>>>>> >>>>>>> If we have configured Airflow to support something else as UTC as the >>>>>>> default timezone or a DAG has a associated timezone we only convert to >>>>>>> that timezone when calculating the next runtime (not for cron btw). >>>>>>> Nowhere else and thus we are UTC everywhere. >>>>>>> >>>>>>> What do you think is inconsistent? >>>>>>> >>>>>>> Bolke >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless to >>>>>>>> timezone aware colums in the task instance is right, or at least it's >>>>>>>> not what I expected. >>>>>>>> >>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the same >>>>>>>> task instance (these outputs from psql directly): >>>>>>>> >>>>>>>> before: execution_date=2017-09-04 00:00:00 >>>>>>>> after: execution_date=2017-09-04 01:00:00+01 >>>>>>>> >>>>>>>> **Okay the migration is fine**. It appears that the migration has done >>>>>>>> the right thing, but my local DB I'm testing with has a Timezone of GB >>>>>>>> set, so Postgres converts it to that TZ on returning an object. >>>>>>>> >>>>>>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to >>>>>>>> have consistent behaviour? Is this possible some how? I don't know >>>>>>>> SQLAlchemy that well. >>>>>>>> >>>>>>>> >>>>>>>> -ash >>>>>>>> >>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor >>>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to now >>>>>>>>> always being "task" (was "s3.task" before.). Logging config is much >>>>>>>>> simpler now though. This may be particular to my logging config, but >>>>>>>>> given how much of a pain it was to set up S3 logging in 1.9 I have >>>>>>>>> shared my config with some people in the Gitter chat so It's not just >>>>>>>>> me. >>>>>>>>> >>>>>>>>> 2) The path that log-files are written to in S3 has changed (again - >>>>>>>>> this happened from 1.8 to 1.9). I'd like to avoid having to move all >>>>>>>>> of my log files again to continue viewing them. The change is that >>>>>>>>> the path now (in 1.10) has a timezone in it, and the date is in local >>>>>>>>> time, before it was UTC: >>>>>>>>> >>>>>>>>> before: 2018-07-23T00:00:00/1.log >>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log >>>>>>>>> >>>>>>>>> We can possibly get away with an updating note about this to set a >>>>>>>>> custom log_filename_template. Testing this now. >>>>>>>>> >>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> -1(binding) from me. >>>>>>>>>> >>>>>>>>>> Installed with: >>>>>>>>>> >>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install >>>>>>>>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr >>>>>>>>>> >>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>, >>>>>>>>>> s3, crypto]>=1.10' >>>>>>>>>> >>>>>>>>>> Install went fine. >>>>>>>>>> >>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there is >>>>>>>>>> now a hard dependency on the Kubernetes client libs, but the `emr` >>>>>>>>>> group doesn't mention this. >>>>>>>>>> >>>>>>>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 >>>>>>>>>> <https://github.com/apache/incubator-airflow/pull/3112> >>>>>>>>>> >>>>>>>>>> I see two options for this - either conditionally enable k8s:// >>>>>>>>>> support if the import works, or (less preferred) add kube-client to >>>>>>>>>> the emr deps (which I like less) >>>>>>>>>> >>>>>>>>>> Sorry - this is the first time I've been able to test it. >>>>>>>>>> >>>>>>>>>> I will install this dep manually and continue testing. >>>>>>>>>> >>>>>>>>>> -ash >>>>>>>>>> >>>>>>>>>> (Normally no time at home due to new baby, but I got a standing >>>>>>>>>> desk, and a carrier meaning she can sleep on me and I can use my >>>>>>>>>> laptop. Win!) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <[email protected] >>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>> >>>>>>>>>>> Bump. >>>>>>>>>>> >>>>>>>>>>> Committers please cast your vote. >>>>>>>>>>> >>>>>>>>>>> B. >>>>>>>>>>> >>>>>>>>>>> Sent from my iPhone >>>>>>>>>>> >>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko <[email protected] >>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> +1 Binding >>>>>>>>>>>> >>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install >>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz >>>>>>>>>>>> >>>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz> >>>>>>>>>>>> >>>>>>>>>>>> Cheers, Fokko >>>>>>>>>>>> >>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <[email protected]>: >>>>>>>>>>>> >>>>>>>>>>>>> Hey all, >>>>>>>>>>>>> >>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on >>>>>>>>>>>>> the release, >>>>>>>>>>>>> which will last for 72 hours. Consider this my (binding) +1. >>>>>>>>>>>>> >>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at: >>>>>>>>>>>>> >>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ >>>>>>>>>>>>> < >>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/> >>>>>>>>>>>>> >>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source >>>>>>>>>>>>> release that >>>>>>>>>>>>> comes with INSTALL instructions. >>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary >>>>>>>>>>>>> Python >>>>>>>>>>>>> "sdist" >>>>>>>>>>>>> release. >>>>>>>>>>>>> >>>>>>>>>>>>> Public keys are available at: >>>>>>>>>>>>> >>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ < >>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/> >>>>>>>>>>>>> >>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at the >>>>>>>>>>>>> changelog. >>>>>>>>>>>>> Since RC2 the following has been fixed: >>>>>>>>>>>>> >>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency >>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords >>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp >>>>>>>>>>>>> >>>>>>>>>>>>> Please note that the version number excludes the `rcX` string as >>>>>>>>>>>>> well >>>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This will >>>>>>>>>>>>> allow us >>>>>>>>>>>>> to rename the artifact without modifying the artifact checksums >>>>>>>>>>>>> when we >>>>>>>>>>>>> actually release. >>>>>>>>>>>>> >>>>>>>>>>>>> WARNING: Due to licensing requirements you will need to set >>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when >>>>>>>>>>>>> installing or upgrading. We will try to remove this requirement >>>>>>>>>>>>> for the >>>>>>>>>>>>> next release. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Bolke >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
