+1 :-)

Sent from my iPhone

> On 5 Aug 2018, at 23:08, Ash Berlin-Taylor <[email protected]> 
> wrote:
> 
> Yup, just worked out the same thing.
> 
> I think as "punishment" for me finding bugs so late in two RCs (this, and 
> 1.9) I should run the release for the next release.
> 
> -ash
> 
>> On 5 Aug 2018, at 22:05, Bolke de Bruin <[email protected]> wrote:
>> 
>> Yeah I figured it out. Originally i was using a different implementation of 
>> UTCDateTime, but that was unmaintained. I switched, but this version changed 
>> or has a different contract. While it transforms on storing to UTC it does 
>> not so when it receives timezone aware fields from the db. Hence the issue.
>> 
>> I will prepare a PR that removes the dependency and implements our own 
>> extension of DateTime. Probably tomorrow.
>> 
>> Good catch! Just in time :-(.
>> 
>> B.
>> 
>>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <[email protected]> 
>>> wrote:
>>> 
>>> Entirely possible, though I wasn't even dealing with the scheduler - the 
>>> issue I was addressing was entirely in the webserver for a pre-existing 
>>> Task Instance.
>>> 
>>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
>>> isn't working right/ as expected. This line: 
>>> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>>>  doens't look right for us - as you mentioned the TZ is set to something 
>>> (rather than having no TZ value).
>>> 
>>> Some background on how Pq handles TZs. It always returns DTs in the TZ of 
>>> the connection. I'm not sure if this is unique to postgres or if other DBs 
>>> behave the same.
>>> 
>>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>>>    timestamptz
>>> ------------------------
>>> 2018-08-03 01:00:00+01
>>> 
>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>    timestamptz
>>> ------------------------
>>> 2018-08-03 01:00:00+01
>>> 
>>> The server will always return TZs in the connection timezone.
>>> 
>>> postgres=# set timezone=utc;
>>> SET
>>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>>    timestamptz
>>> ------------------------
>>> 2018-08-03 00:00:00+00
>>> (1 row)
>>> 
>>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>>>    timestamptz
>>> ------------------------
>>> 2018-08-03 00:00:00+00
>>> (1 row)
>>> 
>>> 
>>> 
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <[email protected]> wrote:
>>>> 
>>>> This is the issue:
>>>> 
>>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>>>> 00:00:00+00:00 tzinfo: <Timezone [UTC]>
>>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun 
>>>> example_http_operator @ 2018-08-03 02:00:00+02:00: 
>>>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>>>> 
>>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>>>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, 
>>>> name=None)
>>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun 
>>>> example_http_operator @ 2018-08-04 02:00:00+02:00: 
>>>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>>>> 
>>>> Notice at line 1+2: that the next run date is correctly in UTC but from 
>>>> the DB it gets a +2. At the next bit (3+4) we get a 
>>>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to 
>>>> the specs of https://github.com/spoqa/sqlalchemy-utc 
>>>> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 
>>>> 
>>>> So changing your setting of the DB to UTC fixes the symptom but not the 
>>>> cause.
>>>> 
>>>> B.
>>>> 
>>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Sorry for being terse before.
>>>>> 
>>>>> So the issue is that the ts loaded from the DB is not in UTC, it's in 
>>>>> GB/+01 (the default of the DB server)
>>>>> 
>>>>> For me, on a currently running 1.9 (no TZ) db:
>>>>> 
>>>>> airflow=# select * from task_instance;
>>>>> get_op            | example_http_operator | 2018-07-23 00:00:00
>>>>> 
>>>>> This date time appears in the log url, and the path it looks at on S3 is 
>>>>> 
>>>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>>>> 
>>>>> If my postgres server has a default timezone of GB (which the one running 
>>>>> on my laptop does), and I then apply the migration then it is converted 
>>>>> to that local time.
>>>>> 
>>>>> airflow=# select * from task_instance;
>>>>> get_op            | example_http_operator | 2018-07-23 01:00:00+01
>>>>> 
>>>>> airflow=# set timezone=UTC;
>>>>> airflow=# select * from task_instance;
>>>>> get_op            | example_http_operator | 2018-07-23 00:00:00+00
>>>>> 
>>>>> 
>>>>> This is all okay so far. The migration has kept the column at the same 
>>>>> moment in time.
>>>>> 
>>>>> The issue come when the UI tries to display logs for this old task: 
>>>>> because the timezone of the connection is not UTC, PG returns a date with 
>>>>> a +01 TZ. Thus after the migration this old task tries to look for a log 
>>>>> file of
>>>>> 
>>>>> .../example_http_operator/2018-07-23T01:00:00/1.log
>>>>> 
>>>>> which doesn't exist - it's changed the time it has rendered from midnight 
>>>>> (in v1.9) to 1am (in v1.10).
>>>>> 
>>>>> (This is with my change to log_filename_template from UPDATING.md in my 
>>>>> other branch)
>>>>> 
>>>>> Setting the timezone to UTC per connection means the behaviour of Airflow 
>>>>> doesn't change depending on how the server is configured.
>>>>> 
>>>>> -ash
>>>>> 
>>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <[email protected]> wrote:
>>>>>> 
>>>>>> Digging in a bit further. 
>>>>>> 
>>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number 
>>>>>> }}}}.log
>>>>>> 
>>>>>> is the format
>>>>>> 
>>>>>> ts = execution_date.isoformat and should be in UTC afaik.
>>>>>> 
>>>>>> something is weird tbh.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> 
>>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <[email protected]> wrote:
>>>>>>> 
>>>>>>> Ash,
>>>>>>> 
>>>>>>> Reading your proposed changes on your “set-timezone-to-utc” branch and 
>>>>>>> below analysis, I am not sure what you are perceiving as an issue.
>>>>>>> 
>>>>>>> For conversion we assume everything is stored in UTC and in a naive 
>>>>>>> format. Conversion then adds the timezone information. This results in 
>>>>>>> the following
>>>>>>> 
>>>>>>> postgres timezone = “Europe/Amsterdam”
>>>>>>> 
>>>>>>> 
>>>>>>> airflow=# select * from task_instance;
>>>>>>> get_op            | example_http_operator | 2018-07-27 02:00:00+02
>>>>>>> 
>>>>>>> airflow=# set timezone=UTC;
>>>>>>> airflow=# select * from task_instance;
>>>>>>> get_op            | example_http_operator | 2018-07-27 00:00:00+00
>>>>>>> 
>>>>>>> If we don’t set the timezone in the connection postgres assumes server 
>>>>>>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow 
>>>>>>> receives will be in “Europe/Amsterdam” format. However as we defined 
>>>>>>> the model to use UTCDateTime it will always convert the returned 
>>>>>>> DateTime to UTC.
>>>>>>> 
>>>>>>> If we have configured Airflow to support something else as UTC as the 
>>>>>>> default timezone or a DAG has a associated timezone we only convert to 
>>>>>>> that timezone when calculating the next runtime (not for cron btw). 
>>>>>>> Nowhere else and thus we are UTC everywhere.
>>>>>>> 
>>>>>>> What do you think is inconsistent?
>>>>>>> 
>>>>>>> Bolke
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor 
>>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless to 
>>>>>>>> timezone aware colums in the task instance is right, or at least it's 
>>>>>>>> not what I expected.
>>>>>>>> 
>>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the same 
>>>>>>>> task instance (these outputs from psql directly):
>>>>>>>> 
>>>>>>>> before: execution_date=2017-09-04 00:00:00
>>>>>>>> after: execution_date=2017-09-04 01:00:00+01
>>>>>>>> 
>>>>>>>> **Okay the migration is fine**. It appears that the migration has done 
>>>>>>>> the right thing, but my local DB I'm testing with has a Timezone of GB 
>>>>>>>> set, so Postgres converts it to that TZ on returning an object.
>>>>>>>> 
>>>>>>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to 
>>>>>>>> have consistent behaviour? Is this possible some how? I don't know 
>>>>>>>> SQLAlchemy that well.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -ash
>>>>>>>> 
>>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to now 
>>>>>>>>> always being "task" (was "s3.task" before.). Logging config is much 
>>>>>>>>> simpler now though. This may be particular to my logging config, but 
>>>>>>>>> given how much of a pain it was to set up S3 logging in 1.9 I have 
>>>>>>>>> shared my config with some people in the Gitter chat so It's not just 
>>>>>>>>> me.
>>>>>>>>> 
>>>>>>>>> 2) The path that log-files are written to in S3 has changed (again - 
>>>>>>>>> this happened from 1.8 to 1.9). I'd like to avoid having to move all 
>>>>>>>>> of my log files again to continue viewing them. The change is that 
>>>>>>>>> the path now (in 1.10) has a timezone in it, and the date is in local 
>>>>>>>>> time, before it was UTC:
>>>>>>>>> 
>>>>>>>>> before: 2018-07-23T00:00:00/1.log
>>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log
>>>>>>>>> 
>>>>>>>>> We can possibly get away with an updating note about this to set a 
>>>>>>>>> custom log_filename_template. Testing this now.
>>>>>>>>> 
>>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> -1(binding) from me.
>>>>>>>>>> 
>>>>>>>>>> Installed with:
>>>>>>>>>> 
>>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>>>>>>>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>>>>>>>>  
>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>>>>>>>>>  s3, crypto]>=1.10'
>>>>>>>>>> 
>>>>>>>>>> Install went fine.
>>>>>>>>>> 
>>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there is 
>>>>>>>>>> now a hard dependency on the Kubernetes client libs, but the `emr` 
>>>>>>>>>> group doesn't mention this.
>>>>>>>>>> 
>>>>>>>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>>>>>>>>> <https://github.com/apache/incubator-airflow/pull/3112>
>>>>>>>>>> 
>>>>>>>>>> I see two options for this - either conditionally enable k8s:// 
>>>>>>>>>> support if the import works, or (less preferred) add kube-client to 
>>>>>>>>>> the emr deps (which I like less)
>>>>>>>>>> 
>>>>>>>>>> Sorry - this is the first time I've been able to test it.
>>>>>>>>>> 
>>>>>>>>>> I will install this dep manually and continue testing.
>>>>>>>>>> 
>>>>>>>>>> -ash
>>>>>>>>>> 
>>>>>>>>>> (Normally no time at home due to new baby, but I got a standing 
>>>>>>>>>> desk, and a carrier meaning she can sleep on me and I can use my 
>>>>>>>>>> laptop. Win!)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Bump. 
>>>>>>>>>>> 
>>>>>>>>>>> Committers please cast your vote. 
>>>>>>>>>>> 
>>>>>>>>>>> B.
>>>>>>>>>>> 
>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>> 
>>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko <[email protected] 
>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> +1 Binding
>>>>>>>>>>>> 
>>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>>>>>>>>>>  
>>>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers, Fokko
>>>>>>>>>>>> 
>>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <[email protected]>:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on 
>>>>>>>>>>>>> the release,
>>>>>>>>>>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/
>>>>>>>>>>>>>  <
>>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source 
>>>>>>>>>>>>> release that
>>>>>>>>>>>>> comes with INSTALL instructions.
>>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary 
>>>>>>>>>>>>> Python
>>>>>>>>>>>>> "sdist"
>>>>>>>>>>>>> release.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Public keys are available at:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>>>>>>>>>>>> changelog.
>>>>>>>>>>>>> Since RC2 the following has been fixed:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please note that the version number excludes the `rcX` string as 
>>>>>>>>>>>>> well
>>>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This will 
>>>>>>>>>>>>> allow us
>>>>>>>>>>>>> to rename the artifact without modifying the artifact checksums 
>>>>>>>>>>>>> when we
>>>>>>>>>>>>> actually release.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> WARNING: Due to licensing requirements you will need to set
>>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>>>>>>>>>>> installing or upgrading. We will try to remove this requirement 
>>>>>>>>>>>>> for the
>>>>>>>>>>>>> next release.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Bolke
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to