[ 
https://issues.apache.org/jira/browse/HIVE-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296225#comment-17296225
 ] 

David Mollitor edited comment on HIVE-24814 at 3/5/21, 5:57 PM:
----------------------------------------------------------------

Here is another example:

{quote}
months_between(date1, date2)

Returns number of months between dates date1 and date2 (as of Hive 1.2.0). If 
date1 is later than date2, then the result is positive. If date1 is earlier 
than date2, then the result is negative. If date1 and date2 are either the same 
days of the month or both last days of months, then the result is always an 
integer. Otherwise the UDF calculates the fractional portion of the result 
based on a 31-day month and considers the difference in time components date1 
and date2. date1 and date2 type can be date, timestamp or string in the format 
'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8 decimal 
places. Example: months_between('1997-02-28 10:30:00', '1996-10-30') = 
3.94959677
{quote}


So this one arbitrary states that the Hive format for a Timestamp will work, 
even though the method parameters are DATE.   Since I am proposing this not be 
allowed, users should have to case their String to a Timestamp to a Date so 
that it actually matches the method parameter.  Alternatively, a months_between 
with timestamps can be introduced.

There is code specially in this UDF to handle the TS format,... It tries to 
first parse the string as a Timestamp, and if that fails, parse it as a Date.  
We have seen this before,... it is *very* slow since every record may end up 
throwing an Exception to denote it's not a TS, and then it parses the Date 
correctly.

 


was (Author: belugabehr):
Here is another example:

{quote}
months_between(date1, date2)

Returns number of months between dates date1 and date2 (as of Hive 1.2.0). If 
date1 is later than date2, then the result is positive. If date1 is earlier 
than date2, then the result is negative. If date1 and date2 are either the same 
days of the month or both last days of months, then the result is always an 
integer. Otherwise the UDF calculates the fractional portion of the result 
based on a 31-day month and considers the difference in time components date1 
and date2. date1 and date2 type can be date, timestamp or string in the format 
'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8 decimal 
places. Example: months_between('1997-02-28 10:30:00', '1996-10-30') = 
3.94959677
{quote}


So this one arbitrary states that the Hive format for a Timestamp will work, 
even though the method parameters are DATE. This is probably only the case 
because it relies on the current Date parsing code which allows for such 
things.  Since I am proposing this not be allowed, users should have to case 
their String to a Timestamp to a Date so that it actually matches the method 
parameter.  Alternatively, a months_between with timestamps can be introduced.

 

> Harmonize Hive Date-Time Formats
> --------------------------------
>
>                 Key: HIVE-24814
>                 URL: https://issues.apache.org/jira/browse/HIVE-24814
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to