[ 
https://issues.apache.org/jira/browse/HIVE-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15634:
-------------------------------------------
    Description: 
{{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change 
timezone for Tez tasks. However, when Fetch optimizer kicks in because we can 
push the full query to Druid, we obtain different values for the timestamp than 
when jobs are executed. This probably has to do with the timezone on the client 
side. How should we handle this issue?

For instance, this can be observed with the following query:
{code:sql}
set hive.fetch.task.conversion=more;
SELECT DISTINCT `__time`
FROM store_sales_sold_time_subset
WHERE `__time` < '1999-11-10 00:00:00';
OK
1999-10-31 19:00:00
1999-11-01 19:00:00
1999-11-02 19:00:00
1999-11-03 19:00:00
1999-11-04 19:00:00
1999-11-05 19:00:00
1999-11-06 19:00:00
1999-11-07 19:00:00
1999-11-08 19:00:00

set hive.fetch.task.conversion=none;
SELECT DISTINCT `__time`
FROM store_sales_sold_time_subset
WHERE `__time` < '1999-11-10 00:00:00';
OK
1999-11-01 00:00:00
1999-11-02 00:00:00
1999-11-03 00:00:00
1999-11-04 00:00:00
1999-11-05 00:00:00
1999-11-06 00:00:00
1999-11-07 00:00:00
1999-11-08 00:00:00
1999-11-09 00:00:00
{code}

  was:
When Fetch optimizer kicks in because we can push the full query to Druid, we 
obtain different values for the timestamp than when jobs are executed. This 
probably has to do with the timezone on the client side.

For instance, this can be observed with the following query:
{code:sql}
set hive.fetch.task.conversion=more;
SELECT DISTINCT `__time`
FROM store_sales_sold_time_subset
WHERE `__time` < '1999-11-10 00:00:00';
OK
1999-10-31 19:00:00
1999-11-01 19:00:00
1999-11-02 19:00:00
1999-11-03 19:00:00
1999-11-04 19:00:00
1999-11-05 19:00:00
1999-11-06 19:00:00
1999-11-07 19:00:00
1999-11-08 19:00:00

set hive.fetch.task.conversion=none;
SELECT DISTINCT `__time`
FROM store_sales_sold_time_subset
WHERE `__time` < '1999-11-10 00:00:00';
OK
1999-11-01 00:00:00
1999-11-02 00:00:00
1999-11-03 00:00:00
1999-11-04 00:00:00
1999-11-05 00:00:00
1999-11-06 00:00:00
1999-11-07 00:00:00
1999-11-08 00:00:00
1999-11-09 00:00:00
{code}


> Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-15634
>                 URL: https://issues.apache.org/jira/browse/HIVE-15634
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>
> {{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change 
> timezone for Tez tasks. However, when Fetch optimizer kicks in because we can 
> push the full query to Druid, we obtain different values for the timestamp 
> than when jobs are executed. This probably has to do with the timezone on the 
> client side. How should we handle this issue?
> For instance, this can be observed with the following query:
> {code:sql}
> set hive.fetch.task.conversion=more;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-10-31 19:00:00
> 1999-11-01 19:00:00
> 1999-11-02 19:00:00
> 1999-11-03 19:00:00
> 1999-11-04 19:00:00
> 1999-11-05 19:00:00
> 1999-11-06 19:00:00
> 1999-11-07 19:00:00
> 1999-11-08 19:00:00
> set hive.fetch.task.conversion=none;
> SELECT DISTINCT `__time`
> FROM store_sales_sold_time_subset
> WHERE `__time` < '1999-11-10 00:00:00';
> OK
> 1999-11-01 00:00:00
> 1999-11-02 00:00:00
> 1999-11-03 00:00:00
> 1999-11-04 00:00:00
> 1999-11-05 00:00:00
> 1999-11-06 00:00:00
> 1999-11-07 00:00:00
> 1999-11-08 00:00:00
> 1999-11-09 00:00:00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to