[ 
https://issues.apache.org/jira/browse/HIVE-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Kovacs updated HIVE-27741:
--------------------------------
    Description: 
When the timezone specified in the *to_utc_timestamp()* function is not valid, 
it still treated as UTC instead of throwing an error. If the user accidentally 
made a typo - e.g. America/Los{color:#ff0000}*t*{color}_Angeles, the query runs 
successfully returning an invalid converted value which can lead to data 
consistency issues. 

Repro code:
{noformat}
docker rm -f hive4

export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
export HS2_ENV_TZ="Europe/Budapest"
export HS2_USER_TZ=${HS2_ENV_TZ}
export HIVE_LOCAL_TZ="America/Los_Angeles"

export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
-Dhive.local.time.zone=$HIVE_LOCAL_TZ"
export HS2_OPTS="$HS2_OPTS  
-Dhive.server2.tez.initialize.default.sessions=false"
docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 --env 
TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 
--name hive4 apache/hive:${HIVE_VERSION}

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/' -e "
SELECT '\${env:TZ}' as \`env:TZ\`,
       '\${system:user.timezone}' as \`system:user.timezone\`,
       '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone
) STORED AS TEXTFILE;
INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';

SELECT
  tz                                            as orig,
  to_utc_timestamp(tz, 'America/Los_Angeles')   as utc_correct_tz,
  to_utc_timestamp(tz, 'Europe/HereIsATypo')    as utc_incorrect_tz,
  to_utc_timestamp(tz, 'LOCAL')                 as utc_local_aslo_incorrect_tz,
  to_utc_timestamp(tz, 'UTC')                   as utc_tz
FROM timestamptest;
"
{noformat}
 
The results are:
{noformat}
+------------------+-----------------------+--------------------------------+
|      env:tz      | system:user.timezone  | hiveconf:hive.local.time.zone  |
+------------------+-----------------------+--------------------------------+
| Europe/Budapest  | Europe/Budapest       | America/Los_Angeles            |
+------------------+-----------------------+--------------------------------+

+--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
|                    orig                    |     utc_correct_tz     |    
utc_incorrect_tz    | utc_local_aslo_incorrect_tz  | utc_tz                 |
+--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
| 2016-01-03 12:26:34.0 America/Los_Angeles  | 2016-01-03 20:26:34.0  | 
2016-01-03 12:26:34.0  | 2016-01-03 12:26:34.0        | 2016-01-03 12:26:34.0  |
+--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
{noformat}
Note:
 * the invalid timezone - utc_incorrect_tz - is treated as UTC
 * also note that LOCAL is also treated as UTC which in fact should be treated 
as system's timezone, but as LOCAL is also an invalid timezone value in hive4, 
ut becomes UTC just like any other invalid and/or typo timezone values (see 
HIVE-27742)

 

Hive should throw an Exception in that case to let the user know that the 
provided timezone is wrong - at least this should be configurable, e.g. via 
something like {*}hive.strict.time.zone.check{*}.

  was:
When the timezone specified in the *to_utc_timestamp()* function is not valid, 
it still treated as UTC instead of throwing an error. If the user accidentally 
made a typo - e.g. America/Los{color:#ff0000}*t*{color}_Angeles, the query runs 
successfully returning an invalid converted value which can lead to data 
consistency issues. 

Repro code:
{noformat}
docker rm -f hive4

export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
export HS2_ENV_TZ="Europe/Budapest"
export HS2_USER_TZ=${HS2_ENV_TZ}
export HIVE_LOCAL_TZ="America/Los_Angeles"

export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
-Dhive.local.time.zone=$HIVE_LOCAL_TZ"
export HS2_OPTS="$HS2_OPTS  
-Dhive.server2.tez.initialize.default.sessions=false"
docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 --env 
TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env SERVICE_NAME=hiveserver2 
--name hive4 apache/hive:${HIVE_VERSION}

docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/' -e "
SELECT '\${env:TZ}' as \`env:TZ\`,
       '\${system:user.timezone}' as \`system:user.timezone\`,
       '\${hiveconf:hive.local.time.zone}' as \`hiveconf:hive.local.time.zone\`;

DROP TABLE IF EXISTS timestamptest;
CREATE TABLE timestamptest (
  ts timestamp,
  tz timestamp with local time zone
) STORED AS TEXTFILE;
INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';

SELECT
  tz                                            as orig,
  to_utc_timestamp(tz, 'America/Los_Angeles')   as utc_correct_tz,
  to_utc_timestamp(tz, 'Europe/HereIsATypo')    as utc_incorrect_tz,
  to_utc_timestamp(tz, 'LOCAL')                 as utc_local_aslo_incorrect_tz,
  to_utc_timestamp(tz, 'UTC')                   as utc_tz
FROM timestamptest;
"
{noformat}
 
The results are:
{noformat}
+------------------+-----------------------+--------------------------------+
|      env:tz      | system:user.timezone  | hiveconf:hive.local.time.zone  |
+------------------+-----------------------+--------------------------------+
| Europe/Budapest  | Europe/Budapest       | America/Los_Angeles            |
+------------------+-----------------------+--------------------------------+

+--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
|                    orig                    |     utc_correct_tz     |    
utc_incorrect_tz    | utc_local_aslo_incorrect_tz  | utc_tz                 |
+--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
| 2016-01-03 12:26:34.0 America/Los_Angeles  | 2016-01-03 20:26:34.0  | 
2016-01-03 12:26:34.0  | 2016-01-03 12:26:34.0        | 2016-01-03 12:26:34.0  |
+--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
{noformat}
Note:
 * the invalid timezone - utc_incorrect_tz - is treated as UTC
 * also note that LOCAL is also treated as UTC which in fact should be treated 
as system's timezone, but as LOCAL is also an invalid timezone value in hive4, 
ut becomes UTC just like any other invalid and/or typo timezone values

 

Hive should throw an Exception in that case to let the user know that the 
provided timezone is wrong - at least this should be configurable, e.g. via 
something like {*}hive.strict.time.zone.check{*}.


> Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead 
> to data consistency issues
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27741
>                 URL: https://issues.apache.org/jira/browse/HIVE-27741
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 4.0.0-beta-1
>            Reporter: Janos Kovacs
>            Assignee: Zoltán Rátkai
>            Priority: Major
>
> When the timezone specified in the *to_utc_timestamp()* function is not 
> valid, it still treated as UTC instead of throwing an error. If the user 
> accidentally made a typo - e.g. America/Los{color:#ff0000}*t*{color}_Angeles, 
> the query runs successfully returning an invalid converted value which can 
> lead to data consistency issues. 
> Repro code:
> {noformat}
> docker rm -f hive4
> export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
> export HS2_ENV_TZ="Europe/Budapest"
> export HS2_USER_TZ=${HS2_ENV_TZ}
> export HIVE_LOCAL_TZ="America/Los_Angeles"
> export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
> -Dhive.local.time.zone=$HIVE_LOCAL_TZ"
> export HS2_OPTS="$HS2_OPTS  
> -Dhive.server2.tez.initialize.default.sessions=false"
> docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 --env 
> TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env 
> SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION}
> docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/' -e "
> SELECT '\${env:TZ}' as \`env:TZ\`,
>        '\${system:user.timezone}' as \`system:user.timezone\`,
>        '\${hiveconf:hive.local.time.zone}' as 
> \`hiveconf:hive.local.time.zone\`;
> DROP TABLE IF EXISTS timestamptest;
> CREATE TABLE timestamptest (
>   ts timestamp,
>   tz timestamp with local time zone
> ) STORED AS TEXTFILE;
> INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
> 12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';
> SELECT
>   tz                                            as orig,
>   to_utc_timestamp(tz, 'America/Los_Angeles')   as utc_correct_tz,
>   to_utc_timestamp(tz, 'Europe/HereIsATypo')    as utc_incorrect_tz,
>   to_utc_timestamp(tz, 'LOCAL')                 as 
> utc_local_aslo_incorrect_tz,
>   to_utc_timestamp(tz, 'UTC')                   as utc_tz
> FROM timestamptest;
> "
> {noformat}
>  
> The results are:
> {noformat}
> +------------------+-----------------------+--------------------------------+
> |      env:tz      | system:user.timezone  | hiveconf:hive.local.time.zone  |
> +------------------+-----------------------+--------------------------------+
> | Europe/Budapest  | Europe/Budapest       | America/Los_Angeles            |
> +------------------+-----------------------+--------------------------------+
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> |                    orig                    |     utc_correct_tz     |    
> utc_incorrect_tz    | utc_local_aslo_incorrect_tz  | utc_tz                 |
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> | 2016-01-03 12:26:34.0 America/Los_Angeles  | 2016-01-03 20:26:34.0  | 
> 2016-01-03 12:26:34.0  | 2016-01-03 12:26:34.0        | 2016-01-03 12:26:34.0 
>  |
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> {noformat}
> Note:
>  * the invalid timezone - utc_incorrect_tz - is treated as UTC
>  * also note that LOCAL is also treated as UTC which in fact should be 
> treated as system's timezone, but as LOCAL is also an invalid timezone value 
> in hive4, ut becomes UTC just like any other invalid and/or typo timezone 
> values (see HIVE-27742)
>  
> Hive should throw an Exception in that case to let the user know that the 
> provided timezone is wrong - at least this should be configurable, e.g. via 
> something like {*}hive.strict.time.zone.check{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to