Yeah I figured it out. Originally i was using a different implementation of 
UTCDateTime, but that was unmaintained. I switched, but this version changed or 
has a different contract. While it transforms on storing to UTC it does not so 
when it receives timezone aware fields from the db. Hence the issue.

I will prepare a PR that removes the dependency and implements our own 
extension of DateTime. Probably tomorrow.

Good catch! Just in time :-(.

B.

> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <ash_airflowl...@firemirror.com> 
> wrote:
> 
> Entirely possible, though I wasn't even dealing with the scheduler - the 
> issue I was addressing was entirely in the webserver for a pre-existing Task 
> Instance.
> 
> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that 
> isn't working right/ as expected. This line: 
> https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
>  doens't look right for us - as you mentioned the TZ is set to something 
> (rather than having no TZ value).
> 
> Some background on how Pq handles TZs. It always returns DTs in the TZ of the 
> connection. I'm not sure if this is unique to postgres or if other DBs behave 
> the same.
> 
> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>      timestamptz
> ------------------------
> 2018-08-03 01:00:00+01
> 
> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>      timestamptz
> ------------------------
> 2018-08-03 01:00:00+01
> 
> The server will always return TZs in the connection timezone.
> 
> postgres=# set timezone=utc;
> SET
> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>      timestamptz
> ------------------------
> 2018-08-03 00:00:00+00
> (1 row)
> 
> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>      timestamptz
> ------------------------
> 2018-08-03 00:00:00+00
> (1 row)
> 
> 
> 
> 
> -ash
> 
>> On 5 Aug 2018, at 21:28, Bolke de Bruin <bdbr...@gmail.com> wrote:
>> 
>> This is the issue:
>> 
>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 
>> 00:00:00+00:00 tzinfo: <Timezone [UTC]>
>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun 
>> example_http_operator @ 2018-08-03 02:00:00+02:00: 
>> scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>> 
>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 
>> 02:00:00+02:00 tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun 
>> example_http_operator @ 2018-08-04 02:00:00+02:00: 
>> scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>> 
>> Notice at line 1+2: that the next run date is correctly in UTC but from the 
>> DB it gets a +2. At the next bit (3+4) we get a 
>> psycopg2.tz.FixedOffsetTimezone which should be set to UTC according to the 
>> specs of https://github.com/spoqa/sqlalchemy-utc 
>> <https://github.com/spoqa/sqlalchemy-utc> , but it isn’t. 
>> 
>> So changing your setting of the DB to UTC fixes the symptom but not the 
>> cause.
>> 
>> B.
>> 
>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor <ash_airflowl...@firemirror.com> 
>>> wrote:
>>> 
>>> Sorry for being terse before.
>>> 
>>> So the issue is that the ts loaded from the DB is not in UTC, it's in 
>>> GB/+01 (the default of the DB server)
>>> 
>>> For me, on a currently running 1.9 (no TZ) db:
>>> 
>>> airflow=# select * from task_instance;
>>> get_op            | example_http_operator | 2018-07-23 00:00:00
>>> 
>>> This date time appears in the log url, and the path it looks at on S3 is 
>>> 
>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>> 
>>> If my postgres server has a default timezone of GB (which the one running 
>>> on my laptop does), and I then apply the migration then it is converted to 
>>> that local time.
>>> 
>>> airflow=# select * from task_instance;
>>> get_op            | example_http_operator | 2018-07-23 01:00:00+01
>>> 
>>> airflow=# set timezone=UTC;
>>> airflow=# select * from task_instance;
>>> get_op            | example_http_operator | 2018-07-23 00:00:00+00
>>> 
>>> 
>>> This is all okay so far. The migration has kept the column at the same 
>>> moment in time.
>>> 
>>> The issue come when the UI tries to display logs for this old task: because 
>>> the timezone of the connection is not UTC, PG returns a date with a +01 TZ. 
>>> Thus after the migration this old task tries to look for a log file of
>>> 
>>> .../example_http_operator/2018-07-23T01:00:00/1.log
>>> 
>>> which doesn't exist - it's changed the time it has rendered from midnight 
>>> (in v1.9) to 1am (in v1.10).
>>> 
>>> (This is with my change to log_filename_template from UPDATING.md in my 
>>> other branch)
>>> 
>>> Setting the timezone to UTC per connection means the behaviour of Airflow 
>>> doesn't change depending on how the server is configured.
>>> 
>>> -ash
>>> 
>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>> 
>>>> Digging in a bit further. 
>>>> 
>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number 
>>>> }}}}.log
>>>> 
>>>> is the format
>>>> 
>>>> ts = execution_date.isoformat and should be in UTC afaik.
>>>> 
>>>> something is weird tbh.
>>>> 
>>>> B.
>>>> 
>>>> 
>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <bdbr...@gmail.com> wrote:
>>>>> 
>>>>> Ash,
>>>>> 
>>>>> Reading your proposed changes on your “set-timezone-to-utc” branch and 
>>>>> below analysis, I am not sure what you are perceiving as an issue.
>>>>> 
>>>>> For conversion we assume everything is stored in UTC and in a naive 
>>>>> format. Conversion then adds the timezone information. This results in 
>>>>> the following
>>>>> 
>>>>> postgres timezone = “Europe/Amsterdam”
>>>>> 
>>>>> 
>>>>> airflow=# select * from task_instance;
>>>>> get_op            | example_http_operator | 2018-07-27 02:00:00+02
>>>>> 
>>>>> airflow=# set timezone=UTC;
>>>>> airflow=# select * from task_instance;
>>>>> get_op            | example_http_operator | 2018-07-27 00:00:00+00
>>>>> 
>>>>> If we don’t set the timezone in the connection postgres assumes server 
>>>>> timezone (in my case “Europe/Amsterdam”). So every datetime Airflow 
>>>>> receives will be in “Europe/Amsterdam” format. However as we defined the 
>>>>> model to use UTCDateTime it will always convert the returned DateTime to 
>>>>> UTC.
>>>>> 
>>>>> If we have configured Airflow to support something else as UTC as the 
>>>>> default timezone or a DAG has a associated timezone we only convert to 
>>>>> that timezone when calculating the next runtime (not for cron btw). 
>>>>> Nowhere else and thus we are UTC everywhere.
>>>>> 
>>>>> What do you think is inconsistent?
>>>>> 
>>>>> Bolke
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor 
>>>>>> <ash_airflowl...@firemirror.com> wrote:
>>>>>> 
>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless to 
>>>>>> timezone aware colums in the task instance is right, or at least it's 
>>>>>> not what I expected.
>>>>>> 
>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the same task 
>>>>>> instance (these outputs from psql directly):
>>>>>> 
>>>>>> before: execution_date=2017-09-04 00:00:00
>>>>>> after: execution_date=2017-09-04 01:00:00+01
>>>>>> 
>>>>>> **Okay the migration is fine**. It appears that the migration has done 
>>>>>> the right thing, but my local DB I'm testing with has a Timezone of GB 
>>>>>> set, so Postgres converts it to that TZ on returning an object.
>>>>>> 
>>>>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy to 
>>>>>> have consistent behaviour? Is this possible some how? I don't know 
>>>>>> SQLAlchemy that well.
>>>>>> 
>>>>>> 
>>>>>> -ash
>>>>>> 
>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor 
>>>>>>> <ash_airflowl...@firemirror.com> wrote:
>>>>>>> 
>>>>>>> 1.) Missing UPDATING note about change of task_log_reader to now always 
>>>>>>> being "task" (was "s3.task" before.). Logging config is much simpler 
>>>>>>> now though. This may be particular to my logging config, but given how 
>>>>>>> much of a pain it was to set up S3 logging in 1.9 I have shared my 
>>>>>>> config with some people in the Gitter chat so It's not just me.
>>>>>>> 
>>>>>>> 2) The path that log-files are written to in S3 has changed (again - 
>>>>>>> this happened from 1.8 to 1.9). I'd like to avoid having to move all of 
>>>>>>> my log files again to continue viewing them. The change is that the 
>>>>>>> path now (in 1.10) has a timezone in it, and the date is in local time, 
>>>>>>> before it was UTC:
>>>>>>> 
>>>>>>> before: 2018-07-23T00:00:00/1.log
>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log
>>>>>>> 
>>>>>>> We can possibly get away with an updating note about this to set a 
>>>>>>> custom log_filename_template. Testing this now.
>>>>>>> 
>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <a...@firemirror.com> wrote:
>>>>>>>> 
>>>>>>>> -1(binding) from me.
>>>>>>>> 
>>>>>>>> Installed with:
>>>>>>>> 
>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install 
>>>>>>>> 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
>>>>>>>>  
>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
>>>>>>>>  s3, crypto]>=1.10'
>>>>>>>> 
>>>>>>>> Install went fine.
>>>>>>>> 
>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing as there is now 
>>>>>>>> a hard dependency on the Kubernetes client libs, but the `emr` group 
>>>>>>>> doesn't mention this.
>>>>>>>> 
>>>>>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112 
>>>>>>>> <https://github.com/apache/incubator-airflow/pull/3112>
>>>>>>>> 
>>>>>>>> I see two options for this - either conditionally enable k8s:// 
>>>>>>>> support if the import works, or (less preferred) add kube-client to 
>>>>>>>> the emr deps (which I like less)
>>>>>>>> 
>>>>>>>> Sorry - this is the first time I've been able to test it.
>>>>>>>> 
>>>>>>>> I will install this dep manually and continue testing.
>>>>>>>> 
>>>>>>>> -ash
>>>>>>>> 
>>>>>>>> (Normally no time at home due to new baby, but I got a standing desk, 
>>>>>>>> and a carrier meaning she can sleep on me and I can use my laptop. 
>>>>>>>> Win!)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <bdbr...@gmail.com 
>>>>>>>>> <mailto:bdbr...@gmail.com>> wrote:
>>>>>>>>> 
>>>>>>>>> Bump. 
>>>>>>>>> 
>>>>>>>>> Committers please cast your vote. 
>>>>>>>>> 
>>>>>>>>> B.
>>>>>>>>> 
>>>>>>>>> Sent from my iPhone
>>>>>>>>> 
>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko <fo...@driesprong.frl 
>>>>>>>>>> <mailto:fo...@driesprong.frl>> wrote:
>>>>>>>>>> 
>>>>>>>>>> +1 Binding
>>>>>>>>>> 
>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes pip install
>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
>>>>>>>>>>  
>>>>>>>>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>>>>>>>>> 
>>>>>>>>>> Cheers, Fokko
>>>>>>>>>> 
>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <bdbr...@gmail.com>:
>>>>>>>>>> 
>>>>>>>>>>> Hey all,
>>>>>>>>>>> 
>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email is calling a vote on the 
>>>>>>>>>>> release,
>>>>>>>>>>> which will last for 72 hours. Consider this my (binding) +1.
>>>>>>>>>>> 
>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at:
>>>>>>>>>>> 
>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/ 
>>>>>>>>>>> <
>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>>>>>>>>> 
>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz is a source 
>>>>>>>>>>> release that
>>>>>>>>>>> comes with INSTALL instructions.
>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz is the binary Python
>>>>>>>>>>> "sdist"
>>>>>>>>>>> release.
>>>>>>>>>>> 
>>>>>>>>>>> Public keys are available at:
>>>>>>>>>>> 
>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/ <
>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>>>>>>>>> 
>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please have a look at the
>>>>>>>>>>> changelog.
>>>>>>>>>>> Since RC2 the following has been fixed:
>>>>>>>>>>> 
>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on GPL dependency
>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await py3.7 keywords
>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>>>>>>>>> 
>>>>>>>>>>> Please note that the version number excludes the `rcX` string as 
>>>>>>>>>>> well
>>>>>>>>>>> as the "+incubating" string, so it's now simply 1.10.0. This will 
>>>>>>>>>>> allow us
>>>>>>>>>>> to rename the artifact without modifying the artifact checksums 
>>>>>>>>>>> when we
>>>>>>>>>>> actually release.
>>>>>>>>>>> 
>>>>>>>>>>> WARNING: Due to licensing requirements you will need to set
>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment when
>>>>>>>>>>> installing or upgrading. We will try to remove this requirement for 
>>>>>>>>>>> the
>>>>>>>>>>> next release.
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Bolke
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to