[ 
https://issues.apache.org/jira/browse/IMPALA-10831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Philippe Gravel updated IMPALA-10831:
------------------------------------------
    Description: 
I take special care to save all my timestamps as UTC in the table:

{{  CREATE TABLE filter_pulses (}}
 {{    tag STRING,}}
 {{    ts TIMESTAMP,}}
 {{    val DOUBLE}}
 {{  )}}
 {{  PARTITIONED BY (}}
 {{    ym STRING –- ym stands for year-month}}
 {{  )}}
 {{  STORED AS PARQUET;}}

When I do the following query:

{{  SELECT max(ts) FROM filter_pulses;}}

{{I' expect to recieve '2021-07-23 23:59:59' but get this result (I'm in 
America/Montreal, UTC-4:00):}}

    1    2021-07-{color:#de350b}23 {color}{color:#de350b}03:59:59{color}

But when I do that other query:

    SELECT tag, max(ts) FROM filter_pulses GROUP BY tag ORDER BY tag

I have all my tags max(ts) 23:59:59 as expected:
|4203-PDT-DPB-400-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-410-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-420-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-430-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-440-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-450-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-460-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-470-01:PV|2021-07-22 23:59:59.0|

None of the tags have a maximum timestamp on the 23rd july at 4 am minus one 
sec.

I also tested the faulty query on HIVE and the result is the good expected 
result of '2021-07-22 23:59:59.0'. {color:#00875a}HIVE is good.{color} IMPALA 
interprets the date tz from the parquet data as America/Montreal and convert it 
to UTC.

See the attached data sample sample for your tests

  was:
I take special care to save all my timestamps as UTC in the table:

{{  CREATE TABLE filter_pulses (}}
 {{    tag STRING,}}
 {{    ts TIMESTAMP,}}
 {{    val DOUBLE}}
 {{  )}}
 {{  PARTITIONED BY (}}
 {{    ym STRING –- ym stands for year-month}}
 {{  )}}
 {{  STORED AS PARQUET;}}

When I do the following query:

{{  SELECT max(ts) FROM filter_pulses;}}

{{I' expect to recieve '2021-07-23 23:59:59' but get this result (I'm in 
America/Montreal, UTC-4:00):}}

    1    2021-07-{color:#de350b}23 {color}{color:#de350b}03:59:59{color}

But when I do that other query:

    SELECT tag, max(ts) FROM filter_pulses GROUP BY tag ORDER BY tag

I have all my tags max(ts) 23:59:59 as expected:
|4203-PDT-DPB-400-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-410-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-420-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-430-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-440-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-450-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-460-01:PV|2021-07-22 23:59:59.0|
|4203-PDT-DPB-470-01:PV|2021-07-22 23:59:59.0|

None of the tags have a maximum timestamp on the 23rd july at 4 am minus one 
sec.

I also tested the faulty query on HIVE and the result is the good expected 
result of '2021-07-22 23:59:59.0'. {color:#00875a}HIVE is good.{color} IMPALA 
interprets the date tz from the parquet data as America/Montreal and convert it 
to UTC.


> Inconsistent time zone conversion to local
> ------------------------------------------
>
>                 Key: IMPALA-10831
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10831
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>         Environment: In IMPALA: SELECT version():
>     impalad version 3.2.0-cdh6.2.0 RELEASE (build 
> edc19942b4debdbfd485fbd26098eef435003f5d) Built on Thu Mar 14 00:14:36 PDT 
> 2019
> -------------------
> SERVER (cat /etc/redhat-release):
>     CentOS release 6.10 (Final)
>            Reporter: Jean-Philippe Gravel
>            Priority: Major
>         Attachments: data.csv.gz
>
>
> I take special care to save all my timestamps as UTC in the table:
> {{  CREATE TABLE filter_pulses (}}
>  {{    tag STRING,}}
>  {{    ts TIMESTAMP,}}
>  {{    val DOUBLE}}
>  {{  )}}
>  {{  PARTITIONED BY (}}
>  {{    ym STRING –- ym stands for year-month}}
>  {{  )}}
>  {{  STORED AS PARQUET;}}
> When I do the following query:
> {{  SELECT max(ts) FROM filter_pulses;}}
> {{I' expect to recieve '2021-07-23 23:59:59' but get this result (I'm in 
> America/Montreal, UTC-4:00):}}
>     1    2021-07-{color:#de350b}23 {color}{color:#de350b}03:59:59{color}
> But when I do that other query:
>     SELECT tag, max(ts) FROM filter_pulses GROUP BY tag ORDER BY tag
> I have all my tags max(ts) 23:59:59 as expected:
> |4203-PDT-DPB-400-01:PV|2021-07-22 23:59:59.0|
> |4203-PDT-DPB-410-01:PV|2021-07-22 23:59:59.0|
> |4203-PDT-DPB-420-01:PV|2021-07-22 23:59:59.0|
> |4203-PDT-DPB-430-01:PV|2021-07-22 23:59:59.0|
> |4203-PDT-DPB-440-01:PV|2021-07-22 23:59:59.0|
> |4203-PDT-DPB-450-01:PV|2021-07-22 23:59:59.0|
> |4203-PDT-DPB-460-01:PV|2021-07-22 23:59:59.0|
> |4203-PDT-DPB-470-01:PV|2021-07-22 23:59:59.0|
> None of the tags have a maximum timestamp on the 23rd july at 4 am minus one 
> sec.
> I also tested the faulty query on HIVE and the result is the good expected 
> result of '2021-07-22 23:59:59.0'. {color:#00875a}HIVE is good.{color} IMPALA 
> interprets the date tz from the parquet data as America/Montreal and convert 
> it to UTC.
> See the attached data sample sample for your tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to