devanbenz commented on issue #12218:
URL: https://github.com/apache/datafusion/issues/12218#issuecomment-2341665968
> Just for the sake of clarity, my original description of the behaviour
holds with times other than `now()`.
>
> ```
> > select arrow_typeof(('2020-01-01T00:00:00Z' AT TIME ZONE
'America/New_York'));
> +-------------------------------------------------+
> | arrow_typeof(Utf8("2020-01-01T00:00:00Z")) |
> +-------------------------------------------------+
> | Timestamp(Nanosecond, Some("America/New_York")) |
> +-------------------------------------------------+
> 1 row(s) fetched.
> Elapsed 0.002 seconds.
>
> > select arrow_typeof(('2020-01-01T00:00:00Z' AT TIME ZONE
'America/New_York')::timestamp);
> +--------------------------------------------+
> | arrow_typeof(Utf8("2020-01-01T00:00:00Z")) |
> +--------------------------------------------+
> | Timestamp(Nanosecond, None) |
> +--------------------------------------------+
> 1 row(s) fetched.
> Elapsed 0.002 seconds.
> ```
>
> That being said, if this is the expected behaviour of other SQL dialects
then it seems reasonable to me that it would be behave the same way in
DataFusion as well.
Ah yeah just testing something similar out in `Clickhouse` I'm seeing that
it strips the timezone information (in fact a raw string being called with
Timezone results in an error) which is in line with how arrow currently works.
```
:) SELECT toTimeZone('2024-09-10 11:45:19'::timestamp, 'America/Denver');
SELECT toTimeZone(CAST('2024-09-10 11:45:19', 'timestamp'), 'America/Denver')
Query id: 1ae2e612-51cd-47e1-9cbb-81acb5fae491
┌─toTimeZone(CAST('2024-09-10 11:45:19', 'timestamp'), 'America/Denver')─┐
1. │ 2024-09-10 10:45:19 │
└────────────────────────────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.002 sec.
:) SELECT toTimeZone('2024-09-10 11:45:19'::timestamp,
'America/Denver')::timestamp;
SELECT CAST(toTimeZone(CAST('2024-09-10 11:45:19', 'timestamp'),
'America/Denver'), 'timestamp')
Query id: 2463bbee-363b-4339-91b2-4c9055e064ce
┌─CAST(toTimeZone(CAST('2024-09-10 11:45:19', 'timestamp'),
'America/Denver'), 'timestamp')─┐
1. │
2024-09-10 11:45:19 │
└───────────────────────────────────────────────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.002 sec.
```
But taking a look at postgres that is not the case as you had mentioned
above:
```
postgres=# SELECT ('2024-09-10 11:45:19' AT TIME ZONE 'America/Denver');
timezone
---------------------
2024-09-10 10:45:19
(1 row)
postgres=# SELECT ('2024-09-10 11:45:19' AT TIME ZONE
'America/Denver')::timestamp;
timezone
---------------------
2024-09-10 10:45:19
(1 row)
```
Taking a look at MariaDB I'm seeing a similar output:
```
MariaDB [(none)]> SELECT TIMESTAMP(CONVERT_TZ('2024-09-10 11:45:19',
'-5:00', '-6:00'));
+----------------------------------------------------------------+
| TIMESTAMP(CONVERT_TZ('2024-09-10 11:45:19', '-5:00', '-6:00')) |
+----------------------------------------------------------------+
| 2024-09-10 10:45:19 |
+----------------------------------------------------------------+
1 row in set (0.001 sec)
```
I think the `now()` function should probably be changed as users would most
likely expect it to function similar to other engines but as for the current
casting methodology I'm going to update
https://github.com/apache/arrow-rs/issues/5827 with a proposal thats more in
line with how established SQL database systems currently operate with
`timestamp -> timezone update -> timestamp` conversion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]