Done. When I roll rc4 it will be part of it.
> On 7 Aug 2018, at 16:26, Naik Kaxil <[email protected]> wrote: > > @bolke Can we also include the following commit to 1.10 release as we would > need this commit to generate docs at ReadTheDocs? > > - > https://github.com/apache/incubator-airflow/commit/8af0aa96bfe3caa51d67ab393db069d37b0c4169 > > Regards, > Kaxil > > On 06/08/2018, 14:59, "James Meickle" <[email protected]> > wrote: > > Not a vote, but a comment: it might be worth noting that the new > environment variable is also required if you have any Airflow plugin test > suites that install Airflow as part of their dependencies. In my case, I > had to set the new env var outsidfe of tox and add this: > > ``` > [testenv] > passenv = SLUGIFY_USES_TEXT_UNIDECODE > ``` > > (`setenv` did not work as that provides env vars at runtime but not > installtime, as far as I can tell.) > > > On Sun, Aug 5, 2018 at 5:20 PM Bolke de Bruin <[email protected]> wrote: > >> +1 :-) >> >> Sent from my iPhone >> >>> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor < >> [email protected]> wrote: >>> >>> Yup, just worked out the same thing. >>> >>> I think as "punishment" for me finding bugs so late in two RCs (this, >> and 1.9) I should run the release for the next release. >>> >>> -ash >>> >>>> On 5 Aug 2018, at 22:05, Bolke de Bruin <[email protected]> wrote: >>>> >>>> Yeah I figured it out. Originally i was using a different >> implementation of UTCDateTime, but that was unmaintained. I switched, but >> this version changed or has a different contract. While it transforms on >> storing to UTC it does not so when it receives timezone aware fields from >> the db. Hence the issue. >>>> >>>> I will prepare a PR that removes the dependency and implements our own >> extension of DateTime. Probably tomorrow. >>>> >>>> Good catch! Just in time :-(. >>>> >>>> B. >>>> >>>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor < >> [email protected]> wrote: >>>>> >>>>> Entirely possible, though I wasn't even dealing with the scheduler - >> the issue I was addressing was entirely in the webserver for a pre-existing >> Task Instance. >>>>> >>>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears >> that isn't working right/ as expected. This line: >> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34 >> doens't look right for us - as you mentioned the TZ is set to something >> (rather than having no TZ value). >>>>> >>>>> Some background on how Pq handles TZs. It always returns DTs in the TZ >> of the connection. I'm not sure if this is unique to postgres or if other >> DBs behave the same. >>>>> >>>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time >> zone; >>>>> timestamptz >>>>> ------------------------ >>>>> 2018-08-03 01:00:00+01 >>>>> >>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; >>>>> timestamptz >>>>> ------------------------ >>>>> 2018-08-03 01:00:00+01 >>>>> >>>>> The server will always return TZs in the connection timezone. >>>>> >>>>> postgres=# set timezone=utc; >>>>> SET >>>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone; >>>>> timestamptz >>>>> ------------------------ >>>>> 2018-08-03 00:00:00+00 >>>>> (1 row) >>>>> >>>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone; >>>>> timestamptz >>>>> ------------------------ >>>>> 2018-08-03 00:00:00+00 >>>>> (1 row) >>>>> >>>>> >>>>> >>>>> >>>>> -ash >>>>> >>>>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <[email protected]> wrote: >>>>>> >>>>>> This is the issue: >>>>>> >>>>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: >> 2018-08-03 00:00:00+00:00 tzinfo: <Timezone [UTC]> >>>>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun >> example_http_operator @ 2018-08-03 02:00:00+02:00: >> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False> >>>>>> >>>>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: >> 2018-08-04 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, >> name=None) >>>>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun >> example_http_operator @ 2018-08-04 02:00:00+02:00: >> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False> >>>>>> >>>>>> Notice at line 1+2: that the next run date is correctly in UTC but >> from the DB it gets a +2. At the next bit (3+4) we get a >> psycopg2.tz.FixedOffsetTimezone >> which should be set to UTC according to the specs of >> https://github.com/spoqa/sqlalchemy-utc < >> https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. >>>>>> >>>>>> So changing your setting of the DB to UTC fixes the symptom but not >> the cause. >>>>>> >>>>>> B. >>>>>> >>>>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor < >> [email protected]> wrote: >>>>>>> >>>>>>> Sorry for being terse before. >>>>>>> >>>>>>> So the issue is that the ts loaded from the DB is not in UTC, it's >> in GB/+01 (the default of the DB server) >>>>>>> >>>>>>> For me, on a currently running 1.9 (no TZ) db: >>>>>>> >>>>>>> airflow=# select * from task_instance; >>>>>>> get_op | example_http_operator | 2018-07-23 00:00:00 >>>>>>> >>>>>>> This date time appears in the log url, and the path it looks at on >> S3 is >>>>>>> >>>>>>> .../example_http_operator/2018-07-23T00:00:00/1.log >>>>>>> >>>>>>> If my postgres server has a default timezone of GB (which the one >> running on my laptop does), and I then apply the migration then it is >> converted to that local time. >>>>>>> >>>>>>> airflow=# select * from task_instance; >>>>>>> get_op | example_http_operator | 2018-07-23 01:00:00+01 >>>>>>> >>>>>>> airflow=# set timezone=UTC; >>>>>>> airflow=# select * from task_instance; >>>>>>> get_op | example_http_operator | 2018-07-23 00:00:00+00 >>>>>>> >>>>>>> >>>>>>> This is all okay so far. The migration has kept the column at the >> same moment in time. >>>>>>> >>>>>>> The issue come when the UI tries to display logs for this old task: >> because the timezone of the connection is not UTC, PG returns a date with a >> +01 TZ. Thus after the migration this old task tries to look for a log file >> of >>>>>>> >>>>>>> .../example_http_operator/2018-07-23T01:00:00/1.log >>>>>>> >>>>>>> which doesn't exist - it's changed the time it has rendered from >> midnight (in v1.9) to 1am (in v1.10). >>>>>>> >>>>>>> (This is with my change to log_filename_template from UPDATING.md in >> my other branch) >>>>>>> >>>>>>> Setting the timezone to UTC per connection means the behaviour of >> Airflow doesn't change depending on how the server is configured. >>>>>>> >>>>>>> -ash >>>>>>> >>>>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <[email protected]> wrote: >>>>>>>> >>>>>>>> Digging in a bit further. >>>>>>>> >>>>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ >> try_number }}}}.log >>>>>>>> >>>>>>>> is the format >>>>>>>> >>>>>>>> ts = execution_date.isoformat and should be in UTC afaik. >>>>>>>> >>>>>>>> something is weird tbh. >>>>>>>> >>>>>>>> B. >>>>>>>> >>>>>>>> >>>>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Ash, >>>>>>>>> >>>>>>>>> Reading your proposed changes on your “set-timezone-to-utc” branch >> and below analysis, I am not sure what you are perceiving as an issue. >>>>>>>>> >>>>>>>>> For conversion we assume everything is stored in UTC and in a >> naive format. Conversion then adds the timezone information. This results >> in the following >>>>>>>>> >>>>>>>>> postgres timezone = “Europe/Amsterdam” >>>>>>>>> >>>>>>>>> >>>>>>>>> airflow=# select * from task_instance; >>>>>>>>> get_op | example_http_operator | 2018-07-27 02:00:00+02 >>>>>>>>> >>>>>>>>> airflow=# set timezone=UTC; >>>>>>>>> airflow=# select * from task_instance; >>>>>>>>> get_op | example_http_operator | 2018-07-27 00:00:00+00 >>>>>>>>> >>>>>>>>> If we don’t set the timezone in the connection postgres assumes >> server timezone (in my case “Europe/Amsterdam”). So every datetime Airflow >> receives will be in “Europe/Amsterdam” format. However as we defined the >> model to use UTCDateTime it will always convert the returned DateTime to >> UTC. >>>>>>>>> >>>>>>>>> If we have configured Airflow to support something else as UTC as >> the default timezone or a DAG has a associated timezone we only convert to >> that timezone when calculating the next runtime (not for cron btw). Nowhere >> else and thus we are UTC everywhere. >>>>>>>>> >>>>>>>>> What do you think is inconsistent? >>>>>>>>> >>>>>>>>> Bolke >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor < >> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless >> to timezone aware colums in the task instance is right, or at least it's >> not what I expected. >>>>>>>>>> >>>>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the >> same task instance (these outputs from psql directly): >>>>>>>>>> >>>>>>>>>> before: execution_date=2017-09-04 00:00:00 >>>>>>>>>> after: execution_date=2017-09-04 01:00:00+01 >>>>>>>>>> >>>>>>>>>> **Okay the migration is fine**. It appears that the migration has >> done the right thing, but my local DB I'm testing with has a Timezone of GB >> set, so Postgres converts it to that TZ on returning an object. >>>>>>>>>> >>>>>>>>>> 3) Do we need to set the TZ of the connection to UTC in >> SQLAlchemy to have consistent behaviour? Is this possible some how? I don't >> know SQLAlchemy that well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -ash >>>>>>>>>> >>>>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor < >> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to now >> always being "task" (was "s3.task" before.). Logging config is much simpler >> now though. This may be particular to my logging config, but given how much >> of a pain it was to set up S3 logging in 1.9 I have shared my config with >> some people in the Gitter chat so It's not just me. >>>>>>>>>>> >>>>>>>>>>> 2) The path that log-files are written to in S3 has changed >> (again - this happened from 1.8 to 1.9). I'd like to avoid having to move >> all of my log files again to continue viewing them. The change is that the >> path now (in 1.10) has a timezone in it, and the date is in local time, >> before it was UTC: >>>>>>>>>>> >>>>>>>>>>> before: 2018-07-23T00:00:00/1.log >>>>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log >>>>>>>>>>> >>>>>>>>>>> We can possibly get away with an updating note about this to set >> a custom log_filename_template. Testing this now. >>>>>>>>>>> >>>>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <[email protected]> >> wrote: >>>>>>>>>>>> >>>>>>>>>>>> -1(binding) from me. >>>>>>>>>>>> >>>>>>>>>>>> Installed with: >>>>>>>>>>>> >>>>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install ' >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr >> < >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>, >> s3, crypto]>=1.10' >>>>>>>>>>>> >>>>>>>>>>>> Install went fine. >>>>>>>>>>>> >>>>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there >> is now a hard dependency on the Kubernetes client libs, but the `emr` group >> doesn't mention this. >>>>>>>>>>>> >>>>>>>>>>>> Introduced in >> https://github.com/apache/incubator-airflow/pull/3112 < >> https://github.com/apache/incubator-airflow/pull/3112> >>>>>>>>>>>> >>>>>>>>>>>> I see two options for this - either conditionally enable k8s:// >> support if the import works, or (less preferred) add kube-client to the emr >> deps (which I like less) >>>>>>>>>>>> >>>>>>>>>>>> Sorry - this is the first time I've been able to test it. >>>>>>>>>>>> >>>>>>>>>>>> I will install this dep manually and continue testing. >>>>>>>>>>>> >>>>>>>>>>>> -ash >>>>>>>>>>>> >>>>>>>>>>>> (Normally no time at home due to new baby, but I got a standing >> desk, and a carrier meaning she can sleep on me and I can use my laptop. >> Win!) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <[email protected] >> <mailto:[email protected]>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Bump. >>>>>>>>>>>>> >>>>>>>>>>>>> Committers please cast your vote. >>>>>>>>>>>>> >>>>>>>>>>>>> B. >>>>>>>>>>>>> >>>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>>> >>>>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko >> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> +1 Binding >>>>>>>>>>>>>> >>>>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip >> install >>>>>>>>>>>>>> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz >> < >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz >>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, Fokko >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <[email protected]>: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote >> on the release, >>>>>>>>>>>>>>> which will last for 72 hours. Consider this my (binding) +1. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ < >>>>>>>>>>>>>>> >> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a >> source release that >>>>>>>>>>>>>>> comes with INSTALL instructions. >>>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary >> Python >>>>>>>>>>>>>>> "sdist" >>>>>>>>>>>>>>> release. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Public keys are available at: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >> https://dist.apache.org/repos/dist/release/incubator/airflow/ < >>>>>>>>>>>>>>> >> https://dist.apache.org/repos/dist/release/incubator/airflow/> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at >> the >>>>>>>>>>>>>>> changelog. >>>>>>>>>>>>>>> Since RC2 the following has been fixed: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency >>>>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords >>>>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please note that the version number excludes the `rcX` >> string as well >>>>>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This >> will allow us >>>>>>>>>>>>>>> to rename the artifact without modifying the artifact >> checksums when we >>>>>>>>>>>>>>> actually release. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> WARNING: Due to licensing requirements you will need to set >>>>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when >>>>>>>>>>>>>>> installing or upgrading. We will try to remove this >> requirement for the >>>>>>>>>>>>>>> next release. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Bolke >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > > > > > Kaxil Naik > > Data Reply > 2nd Floor, Nova South > 160 Victoria Street, Westminster > London SW1E 5LB - UK > phone: +44 (0)20 7730 6000 > [email protected] > www.reply.com
