[
https://issues.apache.org/jira/browse/HIVE-25576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stamatis Zampetakis updated HIVE-25576:
---------------------------------------
Description:
HIVE-25403, HIVE-25458 switched the internal implementation of datetime
formatter for unix_timestamp and from_unixtime from
{{java.text.SimpleDateFormat}} to {{java.time.format.DateTimeFormatter}} in
order fix some bugs and inconsistencies when the aforementioned functions are
combined with other UDFs that have already migrated to use the new modern
java.time package.
The two Java formatters present differences in their behavior leading to
different query results. The supported patterns, between the two formatters,
are also different something that makes existing queries crash at runtime
(after upgrade). Adapting to the new behavior of DateTimeFormatter is a
challenging and time-consuming task for end users especially due to the
widespread use of the afforementioned unixtime functions.
Although DateTimeFormatter is a clear improvement over SimpleDateFormat some
users still want to retain the old behavior for compatibility reasons thus
introducing a property is necessary for facilitating migration.
The goal of this ticket is to introduce a new property namely
{{hive.datetime.formatter}} to control the formatter used by unix_timestamp and
from_unixtime. By default the new {{DateTimeFormatter}} is used while the use
of {{SimpleDateFormat}} is discouraged. Eventually, {{SimpleDateFormat}} will
cease to exist.
was:
*History*
*Hive 1.2* -
VM time zone set to Asia/Bangkok
*Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
UTC','yyyy-MM-dd HH:mm:ss z'));
*Result* - 1800-01-01 07:00:00
*Implementation details* -
SimpleDateFormat formatter = new SimpleDateFormat(pattern);
Long unixtime = formatter.parse(textval).getTime() / 1000;
Date date = new Date(unixtime * 1000L);
https://docs.oracle.com/javase/8/docs/api/java/util/Date.html . In official
documentation they have mention that "Unfortunately, the API for these
functions was not amenable to internationalization and The corresponding
methods in Date are deprecated" . Due to that this is producing wrong result
*Master branch* -
set hive.local.time.zone=Asia/Bangkok;
*Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
UTC','yyyy-MM-dd HH:mm:ss z'));
*Result* - 1800-01-01 06:42:04
*Implementation details* -
DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
.parseCaseInsensitive()
.appendPattern(pattern)
.toFormatter();
ZonedDateTime zonedDateTime =
ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
Long dttime = zonedDateTime.toInstant().getEpochSecond();
*Problem*-
Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which is
giving the correct result but it is not backword compatible. Which is causing
issue at time for migration to new version. Because the older data written is
using Hive 1.x or 2.x is not compatible with *DateTimeFormatter*.
*Solution*
Introduce an config "hive.legacy.timeParserPolicy" with following values -
1. *True*- use *SimpleDateFormat*
2. *False* - use *DateTimeFormatter*
Note: apache spark also face the same issue
https://issues.apache.org/jira/browse/SPARK-30668
> Configurable datetime formatter for unix_timestamp, from_unixtime
> -----------------------------------------------------------------
>
> Key: HIVE-25576
> URL: https://issues.apache.org/jira/browse/HIVE-25576
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2, 4.0.0
> Reporter: Ashish Sharma
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3h
> Remaining Estimate: 0h
>
> HIVE-25403, HIVE-25458 switched the internal implementation of datetime
> formatter for unix_timestamp and from_unixtime from
> {{java.text.SimpleDateFormat}} to {{java.time.format.DateTimeFormatter}} in
> order fix some bugs and inconsistencies when the aforementioned functions are
> combined with other UDFs that have already migrated to use the new modern
> java.time package.
> The two Java formatters present differences in their behavior leading to
> different query results. The supported patterns, between the two formatters,
> are also different something that makes existing queries crash at runtime
> (after upgrade). Adapting to the new behavior of DateTimeFormatter is a
> challenging and time-consuming task for end users especially due to the
> widespread use of the afforementioned unixtime functions.
> Although DateTimeFormatter is a clear improvement over SimpleDateFormat some
> users still want to retain the old behavior for compatibility reasons thus
> introducing a property is necessary for facilitating migration.
> The goal of this ticket is to introduce a new property namely
> {{hive.datetime.formatter}} to control the formatter used by unix_timestamp
> and from_unixtime. By default the new {{DateTimeFormatter}} is used while the
> use of {{SimpleDateFormat}} is discouraged. Eventually, {{SimpleDateFormat}}
> will cease to exist.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)