[ 
https://issues.apache.org/jira/browse/HIVE-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27741:
----------------------------------
    Affects Version/s: 4.0.0

> Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead 
> to data consistency issues
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27741
>                 URL: https://issues.apache.org/jira/browse/HIVE-27741
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 4.0.0, 4.0.0-beta-1
>            Reporter: Janos Kovacs
>            Assignee: Zoltán Rátkai
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> When the timezone specified in the *to_utc_timestamp()* function is not 
> valid, it still treated as UTC instead of throwing an error. If the user 
> accidentally made a typo - e.g. America/Los{color:#ff0000}*t*{color}_Angeles, 
> the query runs successfully returning an invalid converted value which can 
> lead to data consistency issues. 
> Repro code:
> {noformat}
> docker rm -f hive4
> export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
> export HS2_ENV_TZ="Europe/Budapest"
> export HS2_USER_TZ=${HS2_ENV_TZ}
> export HIVE_LOCAL_TZ="America/Los_Angeles"
> export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ 
> -Dhive.local.time.zone=$HIVE_LOCAL_TZ"
> export HS2_OPTS="$HS2_OPTS  
> -Dhive.server2.tez.initialize.default.sessions=false"
> docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 --env 
> TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env 
> SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION}
> docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/' -e "
> SELECT '\${env:TZ}' as \`env:TZ\`,
>        '\${system:user.timezone}' as \`system:user.timezone\`,
>        '\${hiveconf:hive.local.time.zone}' as 
> \`hiveconf:hive.local.time.zone\`;
> DROP TABLE IF EXISTS timestamptest;
> CREATE TABLE timestamptest (
>   ts timestamp,
>   tz timestamp with local time zone
> ) STORED AS TEXTFILE;
> INSERT INTO timestamptest select TIMESTAMP'2016-01-03 
> 12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';
> SELECT
>   tz                                            as orig,
>   to_utc_timestamp(tz, 'America/Los_Angeles')   as utc_correct_tz,
>   to_utc_timestamp(tz, 'Europe/HereIsATypo')    as utc_incorrect_tz,
>   to_utc_timestamp(tz, 'LOCAL')                 as 
> utc_local_aslo_incorrect_tz,
>   to_utc_timestamp(tz, 'UTC')                   as utc_tz
> FROM timestamptest;
> "
> {noformat}
>  
> The results are:
> {noformat}
> +------------------+-----------------------+--------------------------------+
> |      env:tz      | system:user.timezone  | hiveconf:hive.local.time.zone  |
> +------------------+-----------------------+--------------------------------+
> | Europe/Budapest  | Europe/Budapest       | America/Los_Angeles            |
> +------------------+-----------------------+--------------------------------+
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> |                    orig                    |     utc_correct_tz     |    
> utc_incorrect_tz    | utc_local_aslo_incorrect_tz  | utc_tz                 |
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> | 2016-01-03 12:26:34.0 America/Los_Angeles  | 2016-01-03 20:26:34.0  | 
> 2016-01-03 12:26:34.0  | 2016-01-03 12:26:34.0        | 2016-01-03 12:26:34.0 
>  |
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> {noformat}
> Note:
>  * the invalid timezone - utc_incorrect_tz - is treated as UTC
>  * also note that LOCAL is also treated as UTC which in fact should be 
> treated as system's timezone, but as LOCAL is also an invalid timezone value 
> in hive4, ut becomes UTC just like any other invalid and/or typo timezone 
> values (see HIVE-27742)
>  
> Hive should throw an Exception in that case to let the user know that the 
> provided timezone is wrong - at least this should be configurable, e.g. via 
> something like {*}hive.strict.time.zone.check{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to