[
https://issues.apache.org/jira/browse/IMPALA-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344719#comment-17344719
]
ASF subversion and git services commented on IMPALA-5121:
---------------------------------------------------------
Commit 4697db021411e95165cfd199e1aa3fc848dbfc8b in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4697db0 ]
IMPALA-5121: Fix AVG() on timestamp col with
use_local_tz_for_unix_timestamp_conversions
AVG used to contain a back and forth timezone conversion if
use_local_tz_for_unix_timestamp_conversions is true. This could
affect the results if there were values from different DST rules.
Note that AVG on timestamps has other issues besides this, see
IMPALA-7472 for details.
Testing:
- added a regression test
Change-Id: I999099de8e07269b96b75d473f5753be4479cecd
Reviewed-on: http://gerrit.cloudera.org:8080/17412
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> avg() on timestamp col is wrong with
> -use_local_tz_for_unix_timestamp_conversions
> ---------------------------------------------------------------------------------
>
> Key: IMPALA-5121
> URL: https://issues.apache.org/jira/browse/IMPALA-5121
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.5.0, Impala 2.2.10, Impala 2.3.4
> Reporter: Matthew Jacobs
> Assignee: Csaba Ringhofer
> Priority: Major
> Labels: timestamp
>
> The flag '-use_local_tz_for_unix_timestamp_conversions' was added for
> IMPALA-97. Enabling it results in timestamps sometimes being converted into
> localtime, but unfortunately this doesn't seem to be well defined when/where
> this conversion will happen.
> I've noticed that its use seems to break the avg() aggregate function on
> timestamp types (despite being an odd function on timestamps, it should still
> work).
> Impala by default, i.e. not enabling this flag:
> {code}
> [localhost:21000] > select timestamp_col from functional.alltypestiny;
> Query: select timestamp_col from functional.alltypestiny
> Query submitted at: 2017-03-27 18:50:57 (Coordinator:
> http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at:
> http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=8242bb6012948f06:143961ed00000000
> +---------------------+
> | timestamp_col |
> +---------------------+
> | 2009-01-01 00:00:00 |
> | 2009-01-01 00:01:00 |
> | 2009-02-01 00:00:00 |
> | 2009-02-01 00:01:00 |
> | 2009-03-01 00:00:00 |
> | 2009-03-01 00:01:00 |
> | 2009-04-01 00:00:00 |
> | 2009-04-01 00:01:00 |
> +---------------------+
> Fetched 8 row(s) in 0.02s
> [localhost:21000] > select avg(timestamp_col) from functional.alltypestiny;
> Query: select avg(timestamp_col) from functional.alltypestiny
> Query submitted at: 2017-03-27 18:50:59 (Coordinator:
> http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at:
> http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=534f6ab59b201b5e:40e2a86d00000000
> +---------------------+
> | avg(timestamp_col) |
> +---------------------+
> | 2009-02-14 23:45:30 |
> +---------------------+
> {code}
> Then enabling the flag results in the same timestamps returned when scanning,
> but evaluating them in avg() results in them being converted:
> {code}
> [localhost:21000] > select timestamp_col from functional.alltypestiny;
> Query: select timestamp_col from functional.alltypestiny
> Query submitted at: 2017-03-27 18:51:17 (Coordinator:
> http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at:
> http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=ac4ab8fd8caf4be9:ebb0834d00000000
> +---------------------+
> | timestamp_col |
> +---------------------+
> | 2009-01-01 00:00:00 |
> | 2009-01-01 00:01:00 |
> | 2009-02-01 00:00:00 |
> | 2009-02-01 00:01:00 |
> | 2009-03-01 00:00:00 |
> | 2009-03-01 00:01:00 |
> | 2009-04-01 00:00:00 |
> | 2009-04-01 00:01:00 |
> +---------------------+
> Fetched 8 row(s) in 0.30s
> [localhost:21000] > select avg(timestamp_col) from functional.alltypestiny;
> Query: select avg(timestamp_col) from functional.alltypestiny
> Query submitted at: 2017-03-27 18:51:25 (Coordinator:
> http://mj-desktop.ca.cloudera.com:25000)
> Query progress can be monitored at:
> http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=9e4e2c16896090f7:8922c4f200000000
> +---------------------+
> | avg(timestamp_col) |
> +---------------------+
> | 2009-02-15 00:00:30 |
> +---------------------+
> Fetched 1 row(s) in 0.12s
> {code}
> This behavior seems inconsistent and I'm pretty sure is not intentional.
> There are two misleading functions on TimestampValue that will do this
> conversion when the flag is set: ToUnixTime() and ToSubsecondUnixTime().
> avg() seems to have started using ToSubsecondUnixTime() after IMPALA-2914.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]