Matthew Jacobs created IMPALA-5121:
--------------------------------------

             Summary: avg() on timestamp col is wrong with 
-use_local_tz_for_unix_timestamp_conversions
                 Key: IMPALA-5121
                 URL: https://issues.apache.org/jira/browse/IMPALA-5121
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.5.0, Impala 2.2.10, Impala 2.3.4
            Reporter: Matthew Jacobs
            Priority: Critical


The flag '-use_local_tz_for_unix_timestamp_conversions' was added for 
IMPALA-97. Enabling it results in timestamps sometimes being converted into 
localtime, but unfortunately this doesn't seem to be well defined when/where 
this conversion will happen.

I've noticed that its use seems to break the avg() aggregate function on 
timestamp types (despite being an odd function on timestamps, it should still 
work).

Impala by default, i.e. not enabling this flag:
{code}
[localhost:21000] > select timestamp_col from functional.alltypestiny;
Query: select timestamp_col from functional.alltypestiny
Query submitted at: 2017-03-27 18:50:57 (Coordinator: 
http://mj-desktop.ca.cloudera.com:25000)
Query progress can be monitored at: 
http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=8242bb6012948f06:143961ed00000000
+---------------------+
| timestamp_col       |
+---------------------+
| 2009-01-01 00:00:00 |
| 2009-01-01 00:01:00 |
| 2009-02-01 00:00:00 |
| 2009-02-01 00:01:00 |
| 2009-03-01 00:00:00 |
| 2009-03-01 00:01:00 |
| 2009-04-01 00:00:00 |
| 2009-04-01 00:01:00 |
+---------------------+
Fetched 8 row(s) in 0.02s
[localhost:21000] > select avg(timestamp_col) from functional.alltypestiny;
Query: select avg(timestamp_col) from functional.alltypestiny
Query submitted at: 2017-03-27 18:50:59 (Coordinator: 
http://mj-desktop.ca.cloudera.com:25000)
Query progress can be monitored at: 
http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=534f6ab59b201b5e:40e2a86d00000000
+---------------------+
| avg(timestamp_col)  |
+---------------------+
| 2009-02-14 23:45:30 |
+---------------------+
{code}

Then enabling the flag results in the same timestamps returned when scanning, 
but evaluating them in avg() results in them being converted:
{code}
[localhost:21000] > select timestamp_col from functional.alltypestiny;
Query: select timestamp_col from functional.alltypestiny
Query submitted at: 2017-03-27 18:51:17 (Coordinator: 
http://mj-desktop.ca.cloudera.com:25000)
Query progress can be monitored at: 
http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=ac4ab8fd8caf4be9:ebb0834d00000000
+---------------------+
| timestamp_col       |
+---------------------+
| 2009-01-01 00:00:00 |
| 2009-01-01 00:01:00 |
| 2009-02-01 00:00:00 |
| 2009-02-01 00:01:00 |
| 2009-03-01 00:00:00 |
| 2009-03-01 00:01:00 |
| 2009-04-01 00:00:00 |
| 2009-04-01 00:01:00 |
+---------------------+
Fetched 8 row(s) in 0.30s
[localhost:21000] > select avg(timestamp_col) from functional.alltypestiny;
Query: select avg(timestamp_col) from functional.alltypestiny
Query submitted at: 2017-03-27 18:51:25 (Coordinator: 
http://mj-desktop.ca.cloudera.com:25000)
Query progress can be monitored at: 
http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=9e4e2c16896090f7:8922c4f200000000
+---------------------+
| avg(timestamp_col)  |
+---------------------+
| 2009-02-15 00:00:30 |
+---------------------+
Fetched 1 row(s) in 0.12s
{code}

This behavior seems inconsistent and I'm pretty sure is not intentional. There 
are two misleading functions on TimestampValue that will do this conversion 
when the flag is set: ToUnixTime() and ToSubsecondUnixTime(). avg() seems to 
have started using ToSubsecondUnixTime() after IMPALA-2914.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to