[ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882942#comment-17882942
 ] 

Stamatis Zampetakis commented on HIVE-27586:
--------------------------------------------

In the light of HIVE-28483, I performed a series of tests to document the 
behavior of parsing dates from strings across some major Hive versions. Date 
parsing appears in various places and may differ slightly across SQL functions 
so in the tests that follow I only examined the results of the CAST (V AS DATE) 
which is probably the most popular way of performing string to date 
conversions. For various SQL functions, the behavior of the vectorized and 
non-vectorized implementation is not aligned so in the tests I included both 
variants.

 !cast_string_date_hive_versions.svg! 

The tests were performed using the script in  [^cast_as_date.q] file and were 
run using the following command.

{noformat}
mvn test -Dtest=TestCliDriver -Dqfile=cast_as_date.q -Phadoop-2  
-Dtest.output.overwrite
{noformat}

Note that hadoop-2 profile is necessary for building older versions of Hive.

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> ---------------------------------------------------------------------
>
>                 Key: HIVE-27586
>                 URL: https://issues.apache.org/jira/browse/HIVE-27586
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 4.0.0-beta-1
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: backwards-compatibility, pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: cast_as_date.q, cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to