[ 
https://issues.apache.org/jira/browse/HIVE-25576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-25576:
---------------------------------------
    Description: 
HIVE-25403, HIVE-25458 switched the internal implementation of datetime 
formatter for unix_timestamp and from_unixtime from 
{{java.text.SimpleDateFormat}} to {{java.time.format.DateTimeFormatter}} in 
order fix some bugs and inconsistencies when the aforementioned functions are 
combined with other UDFs that have already migrated to use the new modern 
java.time package.

The two Java formatters present differences in their behavior leading to 
different query results. The supported patterns, between the two formatters, 
are also different something that makes existing queries crash at runtime 
(after upgrade). Adapting to the new behavior of DateTimeFormatter is a 
challenging and time-consuming task for end users especially due to the 
widespread use of the afforementioned unixtime functions.

Although DateTimeFormatter is a clear improvement over SimpleDateFormat some 
users still want to retain the old behavior for compatibility reasons thus 
introducing a property is necessary for facilitating migration.

The goal of this ticket is to introduce a new property namely 
{{hive.datetime.formatter}} to control the formatter used by unix_timestamp and 
from_unixtime. By default the new {{DateTimeFormatter}} is used while the use 
of {{SimpleDateFormat}} is discouraged. Eventually, {{SimpleDateFormat}} will 
cease to exist.

  was:
*History*

*Hive 1.2* - 

VM time zone set to Asia/Bangkok

*Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','yyyy-MM-dd HH:mm:ss z'));

*Result* - 1800-01-01 07:00:00

*Implementation details* - 

SimpleDateFormat formatter = new SimpleDateFormat(pattern);
Long unixtime = formatter.parse(textval).getTime() / 1000;
Date date = new Date(unixtime * 1000L);

https://docs.oracle.com/javase/8/docs/api/java/util/Date.html . In official 
documentation they have mention that "Unfortunately, the API for these 
functions was not amenable to internationalization and The corresponding 
methods in Date are deprecated" . Due to that this is producing wrong result

*Master branch* - 

set hive.local.time.zone=Asia/Bangkok;

*Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','yyyy-MM-dd HH:mm:ss z'));

*Result* - 1800-01-01 06:42:04

*Implementation details* - 

DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
        .parseCaseInsensitive()
        .appendPattern(pattern)
        .toFormatter();

    ZonedDateTime zonedDateTime = 
ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
    Long dttime = zonedDateTime.toInstant().getEpochSecond();


*Problem*- 

Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which is 
giving the correct result but it is not backword compatible. Which is causing 
issue at time for migration to new version. Because the older data written is 
using Hive 1.x or 2.x is not compatible with *DateTimeFormatter*.

*Solution*

Introduce an config "hive.legacy.timeParserPolicy" with following values -
1. *True*- use *SimpleDateFormat* 
2. *False*  - use *DateTimeFormatter*


Note: apache spark also face the same issue 
https://issues.apache.org/jira/browse/SPARK-30668




> Configurable datetime formatter for unix_timestamp, from_unixtime
> -----------------------------------------------------------------
>
>                 Key: HIVE-25576
>                 URL: https://issues.apache.org/jira/browse/HIVE-25576
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2, 4.0.0
>            Reporter: Ashish Sharma
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> HIVE-25403, HIVE-25458 switched the internal implementation of datetime 
> formatter for unix_timestamp and from_unixtime from 
> {{java.text.SimpleDateFormat}} to {{java.time.format.DateTimeFormatter}} in 
> order fix some bugs and inconsistencies when the aforementioned functions are 
> combined with other UDFs that have already migrated to use the new modern 
> java.time package.
> The two Java formatters present differences in their behavior leading to 
> different query results. The supported patterns, between the two formatters, 
> are also different something that makes existing queries crash at runtime 
> (after upgrade). Adapting to the new behavior of DateTimeFormatter is a 
> challenging and time-consuming task for end users especially due to the 
> widespread use of the afforementioned unixtime functions.
> Although DateTimeFormatter is a clear improvement over SimpleDateFormat some 
> users still want to retain the old behavior for compatibility reasons thus 
> introducing a property is necessary for facilitating migration.
> The goal of this ticket is to introduce a new property namely 
> {{hive.datetime.formatter}} to control the formatter used by unix_timestamp 
> and from_unixtime. By default the new {{DateTimeFormatter}} is used while the 
> use of {{SimpleDateFormat}} is discouraged. Eventually, {{SimpleDateFormat}} 
> will cease to exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to