Could you upgrading to 1.9 first? And see if that helps? -ash
> On 8 Aug 2018, at 00:07, George Leslie-Waksman <[email protected] > <mailto:[email protected]>> wrote: > > We just tried to upgrade a 1.8.1 install to 1.10rc3 and ran into a critical > error on alembic migration execution. I have captured the issue in JIRA: > https://issues.apache.org/jira/browse/AIRFLOW-2870 > <https://issues.apache.org/jira/browse/AIRFLOW-2870> > > I would consider this a critical blocker for release because it hard blocks > upgrading. > > George > > On Tue, Aug 7, 2018 at 7:58 AM Bolke de Bruin <[email protected] > <mailto:[email protected]>> wrote: > >> Done. When I roll rc4 it will be part of it. >> >> >>> On 7 Aug 2018, at 16:26, Naik Kaxil <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> @bolke Can we also include the following commit to 1.10 release as we >> would need this commit to generate docs at ReadTheDocs? >>> >>> - >> https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169 >> >> <https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169> >>> >>> Regards, >>> Kaxil >>> >>> On 06/08/2018, 14:59, "James Meickle" <[email protected]> >> wrote: >>> >>> Not a vote, but a comment: it might be worth noting that the new >>> environment variable is also required if you have any Airflow plugin >> test >>> suites that install Airflow as part of their dependencies. In my >> case, I >>> had to set the new env var outsidfe of tox and add this: >>> >>> ``` >>> [testenv] >>> passenv = SLUGIFY_USES_TEXT_UNIDECODE >>> ``` >>> >>> (`setenv` did not work as that provides env vars at runtime but not >>> installtime, as far as I can tell.) >>> >>> >>> On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin <[email protected]> >> wrote: >>> >>>> +1 :-) >>>> >>>> Sent from my iPhone >>>> >>>>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor < >>>> [email protected]> wrote: >>>>> >>>>> Yup, just worked out the same thing. >>>>> >>>>> I think as "punishment" for me finding bugs so late in two RCs (this, >>>> and 1.9) I should run the release for the next release. >>>>> >>>>> -ash >>>>> >>>>>> On 5 Aug 2018, at 22:05, Bolke de Bruin <[email protected]> wrote: >>>>>> >>>>>> Yeah I figured it out. Originally i was using a different >>>> implementation of UTCDateTime, but that was unmaintained. I switched, >> but >>>> this version changed or has a different contract. While it transforms on >>>> storing to UTC it does not so when it receives timezone aware fields >> from >>>> the db. Hence the issue. >>>>>> >>>>>> I will prepare a PR that removes the dependency and implements our own >>>> extension of DateTime. Probably tomorrow. >>>>>> >>>>>> Good catch! Just in time :-(. >>>>>> >>>>>> B. >>>>>> >>>>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor < >>>> [email protected]> wrote: >>>>>>> >>>>>>> Entirely possible, though I wasn't even dealing with the scheduler - >>>> the issue I was addressing was entirely in the webserver for a >> pre-existing >>>> Task Instance. >>>>>>> >>>>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears >>>> that isn't working right/ as expected. This line: >>>> >> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34 >>>> doens't look right for us - as you mentioned the TZ is set to something >>>> (rather than having no TZ value). >>>>>>> >>>>>>> Some background on how Pq handles TZs. It always returns DTs in the >> TZ >>>> of the connection. I'm not sure if this is unique to postgres or if >> other >>>> DBs behave the same. >>>>>>> >>>>>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time >>>> zone; >>>>>>> timestamptz >>>>>>> ------------------------ >>>>>>> 2018-08-03 01:00:00+01 >>>>>>> >>>>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; >>>>>>> timestamptz >>>>>>> ------------------------ >>>>>>> 2018-08-03 01:00:00+01 >>>>>>> >>>>>>> The server will always return TZs in the connection timezone. >>>>>>> >>>>>>> postgres=# set timezone=utc; >>>>>>> SET >>>>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; >>>>>>> timestamptz >>>>>>> ------------------------ >>>>>>> 2018-08-03 00:00:00+00 >>>>>>> (1 row) >>>>>>> >>>>>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone; >>>>>>> timestamptz >>>>>>> ------------------------ >>>>>>> 2018-08-03 00:00:00+00 >>>>>>> (1 row) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -ash >>>>>>> >>>>>>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <[email protected]> wrote: >>>>>>>> >>>>>>>> This is the issue: >>>>>>>> >>>>>>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: >>>> 2018-08-03 00:00:00+00:00 tzinfo: <Timezone [UTC]> >>>>>>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun >>>> example_http_operator @ 2018-08-03 02:00:00+02:00: >>>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False> >>>>>>>> >>>>>>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: >>>> 2018-08-04 02:00:00+02:00 tzinfo: psycopg2.tz >> .FixedOffsetTimezone(offset=120, >>>> name=None) >>>>>>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun >>>> example_http_operator @ 2018-08-04 02:00:00+02:00: >>>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False> >>>>>>>> >>>>>>>> Notice at line 1+2: that the next run date is correctly in UTC but >>>> from the DB it gets a +2. At the next bit (3+4) we get a psycopg2.tz >> .FixedOffsetTimezone >>>> which should be set to UTC according to the specs of >>>> https://github.com/spoqa/sqlalchemy-utc < >>>> https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. >>>>>>>> >>>>>>>> So changing your setting of the DB to UTC fixes the symptom but not >>>> the cause. >>>>>>>> >>>>>>>> B. >>>>>>>> >>>>>>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor < >>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>> Sorry for being terse before. >>>>>>>>> >>>>>>>>> So the issue is that the ts loaded from the DB is not in UTC, it's >>>> in GB/+01 (the default of the DB server) >>>>>>>>> >>>>>>>>> For me, on a currently running 1.9 (no TZ) db: >>>>>>>>> >>>>>>>>> airflow=# select * from task_instance; >>>>>>>>> get_op | example_http_operator | 2018-07-23 00:00:00 >>>>>>>>> >>>>>>>>> This date time appears in the log url, and the path it looks at on >>>> S3 is >>>>>>>>> >>>>>>>>> .../example_http_operator/2018-07-23T00:00:00/1.log >>>>>>>>> >>>>>>>>> If my postgres server has a default timezone of GB (which the one >>>> running on my laptop does), and I then apply the migration then it is >>>> converted to that local time. >>>>>>>>> >>>>>>>>> airflow=# select * from task_instance; >>>>>>>>> get_op | example_http_operator | 2018-07-23 01:00:00+01 >>>>>>>>> >>>>>>>>> airflow=# set timezone=UTC; >>>>>>>>> airflow=# select * from task_instance; >>>>>>>>> get_op | example_http_operator | 2018-07-23 00:00:00+00 >>>>>>>>> >>>>>>>>> >>>>>>>>> This is all okay so far. The migration has kept the column at the >>>> same moment in time. >>>>>>>>> >>>>>>>>> The issue come when the UI tries to display logs for this old task: >>>> because the timezone of the connection is not UTC, PG returns a date >> with a >>>> +01 TZ. Thus after the migration this old task tries to look for a log >> file >>>> of >>>>>>>>> >>>>>>>>> .../example_http_operator/2018-07-23T01:00:00/1.log >>>>>>>>> >>>>>>>>> which doesn't exist - it's changed the time it has rendered from >>>> midnight (in v1.9) to 1am (in v1.10). >>>>>>>>> >>>>>>>>> (This is with my change to log_filename_template from UPDATING.md >> in >>>> my other branch) >>>>>>>>> >>>>>>>>> Setting the timezone to UTC per connection means the behaviour of >>>> Airflow doesn't change depending on how the server is configured. >>>>>>>>> >>>>>>>>> -ash >>>>>>>>> >>>>>>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <[email protected]> >> wrote: >>>>>>>>>> >>>>>>>>>> Digging in a bit further. >>>>>>>>>> >>>>>>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ >>>> try_number }}}}.log >>>>>>>>>> >>>>>>>>>> is the format >>>>>>>>>> >>>>>>>>>> ts = execution_date.isoformat and should be in UTC afaik. >>>>>>>>>> >>>>>>>>>> something is weird tbh. >>>>>>>>>> >>>>>>>>>> B. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <[email protected]> >> wrote: >>>>>>>>>>> >>>>>>>>>>> Ash, >>>>>>>>>>> >>>>>>>>>>> Reading your proposed changes on your “set-timezone-to-utc” >> branch >>>> and below analysis, I am not sure what you are perceiving as an issue. >>>>>>>>>>> >>>>>>>>>>> For conversion we assume everything is stored in UTC and in a >>>> naive format. Conversion then adds the timezone information. This >> results >>>> in the following >>>>>>>>>>> >>>>>>>>>>> postgres timezone = “Europe/Amsterdam” >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> airflow=# select * from task_instance; >>>>>>>>>>> get_op | example_http_operator | 2018-07-27 >> 02:00:00+02 >>>>>>>>>>> >>>>>>>>>>> airflow=# set timezone=UTC; >>>>>>>>>>> airflow=# select * from task_instance; >>>>>>>>>>> get_op | example_http_operator | 2018-07-27 >> 00:00:00+00 >>>>>>>>>>> >>>>>>>>>>> If we don’t set the timezone in the connection postgres assumes >>>> server timezone (in my case “Europe/Amsterdam”). So every datetime >> Airflow >>>> receives will be in “Europe/Amsterdam” format. However as we defined the >>>> model to use UTCDateTime it will always convert the returned DateTime to >>>> UTC. >>>>>>>>>>> >>>>>>>>>>> If we have configured Airflow to support something else as UTC as >>>> the default timezone or a DAG has a associated timezone we only convert >> to >>>> that timezone when calculating the next runtime (not for cron btw). >> Nowhere >>>> else and thus we are UTC everywhere. >>>>>>>>>>> >>>>>>>>>>> What do you think is inconsistent? >>>>>>>>>>> >>>>>>>>>>> Bolke >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor < >>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless >>>> to timezone aware colums in the task instance is right, or at least it's >>>> not what I expected. >>>>>>>>>>>> >>>>>>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the >>>> same task instance (these outputs from psql directly): >>>>>>>>>>>> >>>>>>>>>>>> before: execution_date=2017-09-04 00:00:00 >>>>>>>>>>>> after: execution_date=2017-09-04 01:00:00+01 >>>>>>>>>>>> >>>>>>>>>>>> **Okay the migration is fine**. It appears that the migration >> has >>>> done the right thing, but my local DB I'm testing with has a Timezone >> of GB >>>> set, so Postgres converts it to that TZ on returning an object. >>>>>>>>>>>> >>>>>>>>>>>> 3) Do we need to set the TZ of the connection to UTC in >>>> SQLAlchemy to have consistent behaviour? Is this possible some how? I >> don't >>>> know SQLAlchemy that well. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -ash >>>>>>>>>>>> >>>>>>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor < >>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to >> now >>>> always being "task" (was "s3.task" before.). Logging config is much >> simpler >>>> now though. This may be particular to my logging config, but given how >> much >>>> of a pain it was to set up S3 logging in 1.9 I have shared my config >> with >>>> some people in the Gitter chat so It's not just me. >>>>>>>>>>>>> >>>>>>>>>>>>> 2) The path that log-files are written to in S3 has changed >>>> (again - this happened from 1.8 to 1.9). I'd like to avoid having to >> move >>>> all of my log files again to continue viewing them. The change is that >> the >>>> path now (in 1.10) has a timezone in it, and the date is in local time, >>>> before it was UTC: >>>>>>>>>>>>> >>>>>>>>>>>>> before: 2018-07-23T00:00:00/1.log >>>>>>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log >>>>>>>>>>>>> >>>>>>>>>>>>> We can possibly get away with an updating note about this to >> set >>>> a custom log_filename_template. Testing this now. >>>>>>>>>>>>> >>>>>>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor < >> [email protected]> >>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> -1(binding) from me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Installed with: >>>>>>>>>>>>>> >>>>>>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install ' >>>> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr >>>> < >>>> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr >>> , >>>> s3, crypto]>=1.10' >>>>>>>>>>>>>> >>>>>>>>>>>>>> Install went fine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there >>>> is now a hard dependency on the Kubernetes client libs, but the `emr` >> group >>>> doesn't mention this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Introduced in >>>> https://github.com/apache/incubator-airflow/pull/3112 < >>>> https://github.com/apache/incubator-airflow/pull/3112> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I see two options for this - either conditionally enable >> k8s:// >>>> support if the import works, or (less preferred) add kube-client to the >> emr >>>> deps (which I like less) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry - this is the first time I've been able to test it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I will install this dep manually and continue testing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -ash >>>>>>>>>>>>>> >>>>>>>>>>>>>> (Normally no time at home due to new baby, but I got a >> standing >>>> desk, and a carrier meaning she can sleep on me and I can use my laptop. >>>> Win!) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin >> <https://maps.google.com/?q=ruin++%3E+&entry=gmail&source=g >> <https://maps.google.com/?q=ruin++%3E+&entry=gmail&source=g>>< >> [email protected] <mailto:[email protected]> >>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Bump. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Committers please cast your vote. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> B. >>>>>>>>>>>>>>> <https://maps.google.com/?q=ruin++%3E+&entry=gmail&source=g >>>>>>>>>>>>>>> <https://maps.google.com/?q=ruin++%3E+&entry=gmail&source=g>> >>>>>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +1 Binding >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip >>>> install >>>>>>>>>>>>>>>> >>>> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz >>>> < >>>> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz >>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, Fokko >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <[email protected] >>> : >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote >>>> on the release, >>>>>>>>>>>>>>>>> which will last for 72 hours. Consider this my (binding) >> +1. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ < >>>>>>>>>>>>>>>>> >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a >>>> source release that >>>>>>>>>>>>>>>>> comes with INSTALL instructions. >>>>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the >> binary >>>> Python >>>>>>>>>>>>>>>>> "sdist" >>>>>>>>>>>>>>>>> release. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Public keys are available at: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ < >>>>>>>>>>>>>>>>> >>>> https://dist.apache.org/repos/dist/release/incubator/airflow/> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look >> at >>>> the >>>>>>>>>>>>>>>>> changelog. >>>>>>>>>>>>>>>>> Since RC2 the following has been fixed: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency >>>>>>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords >>>>>>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please note that the version number excludes the `rcX` >>>> string as well >>>>>>>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. >> This >>>> will allow us >>>>>>>>>>>>>>>>> to rename the artifact without modifying the artifact >>>> checksums when we >>>>>>>>>>>>>>>>> actually release. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> WARNING: Due to licensing requirements you will need to set >>>>>>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when >>>>>>>>>>>>>>>>> installing or upgrading. We will try to remove this >>>> requirement for the >>>>>>>>>>>>>>>>> next release. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Bolke >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> >>> >>> >>> >>> Kaxil Naik >>> >>> Data Reply >>> 2nd Floor, Nova South >>> 160 Victoria Street, Westminster >>> London SW1E 5LB - UK >>> phone: +44 (0)20 7730 6000 <+44%2020%207730%206000> >>> [email protected] <mailto:[email protected]> >>> www.reply.com <http://www.reply.com/>
