[
https://issues.apache.org/jira/browse/HIVE-27741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835351#comment-17835351
]
Denys Kuzmenko commented on HIVE-27741:
---------------------------------------
Merged to master
[~zratkai] thanks for the patch and [~okumin], [~simhadri-g] for the review!
> Invalid timezone value in to_utc_timestamp() is treated as UTC which can lead
> to data consistency issues
> --------------------------------------------------------------------------------------------------------
>
> Key: HIVE-27741
> URL: https://issues.apache.org/jira/browse/HIVE-27741
> Project: Hive
> Issue Type: Sub-task
> Affects Versions: 4.0.0-beta-1
> Reporter: Janos Kovacs
> Assignee: Zoltán Rátkai
> Priority: Major
> Labels: pull-request-available
>
> When the timezone specified in the *to_utc_timestamp()* function is not
> valid, it still treated as UTC instead of throwing an error. If the user
> accidentally made a typo - e.g. America/Los{color:#ff0000}*t*{color}_Angeles,
> the query runs successfully returning an invalid converted value which can
> lead to data consistency issues.
> Repro code:
> {noformat}
> docker rm -f hive4
> export HIVE_VERSION=4.0.0-beta-2-SNAPSHOT
> export HS2_ENV_TZ="Europe/Budapest"
> export HS2_USER_TZ=${HS2_ENV_TZ}
> export HIVE_LOCAL_TZ="America/Los_Angeles"
> export HS2_OPTS="-Duser.timezone=$HS2_USER_TZ
> -Dhive.local.time.zone=$HIVE_LOCAL_TZ"
> export HS2_OPTS="$HS2_OPTS
> -Dhive.server2.tez.initialize.default.sessions=false"
> docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 --env
> TZ=${HS2_ENV_TZ} --env SERVICE_OPTS=${HS2_OPTS} --env
> SERVICE_NAME=hiveserver2 --name hive4 apache/hive:${HIVE_VERSION}
> docker exec -it hive4 beeline -u 'jdbc:hive2://localhost:10000/' -e "
> SELECT '\${env:TZ}' as \`env:TZ\`,
> '\${system:user.timezone}' as \`system:user.timezone\`,
> '\${hiveconf:hive.local.time.zone}' as
> \`hiveconf:hive.local.time.zone\`;
> DROP TABLE IF EXISTS timestamptest;
> CREATE TABLE timestamptest (
> ts timestamp,
> tz timestamp with local time zone
> ) STORED AS TEXTFILE;
> INSERT INTO timestamptest select TIMESTAMP'2016-01-03
> 12:26:34',TIMESTAMPLOCALTZ'2016-01-03 12:26:34 America/Los_Angeles';
> SELECT
> tz as orig,
> to_utc_timestamp(tz, 'America/Los_Angeles') as utc_correct_tz,
> to_utc_timestamp(tz, 'Europe/HereIsATypo') as utc_incorrect_tz,
> to_utc_timestamp(tz, 'LOCAL') as
> utc_local_aslo_incorrect_tz,
> to_utc_timestamp(tz, 'UTC') as utc_tz
> FROM timestamptest;
> "
> {noformat}
>
> The results are:
> {noformat}
> +------------------+-----------------------+--------------------------------+
> | env:tz | system:user.timezone | hiveconf:hive.local.time.zone |
> +------------------+-----------------------+--------------------------------+
> | Europe/Budapest | Europe/Budapest | America/Los_Angeles |
> +------------------+-----------------------+--------------------------------+
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> | orig | utc_correct_tz |
> utc_incorrect_tz | utc_local_aslo_incorrect_tz | utc_tz |
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> | 2016-01-03 12:26:34.0 America/Los_Angeles | 2016-01-03 20:26:34.0 |
> 2016-01-03 12:26:34.0 | 2016-01-03 12:26:34.0 | 2016-01-03 12:26:34.0
> |
> +--------------------------------------------+------------------------+------------------------+------------------------------+------------------------+
> {noformat}
> Note:
> * the invalid timezone - utc_incorrect_tz - is treated as UTC
> * also note that LOCAL is also treated as UTC which in fact should be
> treated as system's timezone, but as LOCAL is also an invalid timezone value
> in hive4, ut becomes UTC just like any other invalid and/or typo timezone
> values (see HIVE-27742)
>
> Hive should throw an Exception in that case to let the user know that the
> provided timezone is wrong - at least this should be configurable, e.g. via
> something like {*}hive.strict.time.zone.check{*}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)