[
https://issues.apache.org/jira/browse/HIVE-27772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stamatis Zampetakis resolved HIVE-27772.
----------------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Fixed in
https://github.com/apache/hive/commit/1c126d947448ffc9784a1465306e018ba183a014.
Thanks for the PR [~simhadri-g]!
[~simhadri-g] If you need to update the description here based on the last
changes that were merged please do so.
Please update the [Hive
wiki|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Datetime]
with the new configuration property introduced here.
> UNIX_TIMESTAMP should return NULL when date fields are out of bounds
> --------------------------------------------------------------------
>
> Key: HIVE-27772
> URL: https://issues.apache.org/jira/browse/HIVE-27772
> Project: Hive
> Issue Type: Bug
> Reporter: Simhadri Govindappa
> Assignee: Simhadri Govindappa
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> For invalid dates such as 2001-02-31, 2023-04-31 etc, UNIX_TIMESTAMP() is
> giving out the timestamp value as the last valid date, rather than NULL.
> (e.g. UNIX_TIMESTAMP('2001-02-31', 'yyyy-MM-dd') gives 983354400, which
> converts to '2001-02-28'. However, for calendar days larger than 31, e.g.
> 2001-02-32, or 2023-04-32, UNIX_TIMESTAMP() would give NULL as a result.
> In Spark and mysql, UNIX_TIMESTMAP for these invalid dates are all NULL (or
> 0).
>
> {noformat}
> 6: jdbc:hive2://localhost:10001/> select month, datetimestamp,
> unix_timestamp(datetimestamp, 'yyyy-MM-dd') as timestampCol from
> datetimetable;
> INFO : Compiling
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62):
> select month, datetimestamp, unix_timestamp(datetimestamp, 'yyyy-MM-dd') as
> timestampCol from datetimetable
> INFO : No Stats for default@datetimetable, Columns: month, datetimestamp
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:month,
> type:string, comment:null), FieldSchema(name:datetimestamp, type:string,
> comment:null), FieldSchema(name:timestampcol, type:bigint, comment:null)],
> properties:null)
> INFO : Completed compiling
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62);
> Time taken: 0.102 seconds
> INFO : Operation QUERY obtained 0 locks
> INFO : Executing
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62):
> select month, datetimestamp, unix_timestamp(datetimestamp, 'yyyy-MM-dd') as
> timestampCol from datetimetable
> INFO : Completed executing
> command(queryId=root_20231005104216_8520e3e9-d03b-4e34-93ec-ddc280845c62);
> Time taken: 0.0 seconds
> +--------+----------------+---------------+
> | month | datetimestamp | timestampcol |
> +--------+----------------+---------------+
> | Feb | 2001-02-28 | 983318400 |
> | Feb | 2001-02-29 | 983318400 |
> | Feb | 2001-02-30 | 983318400 |
> | Feb | 2001-02-31 | 983318400 |
> | Feb | 2001-02-32 | NULL |
> +--------+----------------+---------------+
> 5 rows selected (0.131 seconds){noformat}
>
>
> It looks like
> [InstantDateTimeFormatter.java#L52|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/InstantDateTimeFormatter.java#L52]
> by default, the formatter has the SMART resolver style.
> According to java jdk :
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/java/time/format/ResolverStyle.java#L103
>
> {noformat}
> /**
> * Style to resolve dates and times strictly.
> * <p>
> * Using strict resolution will ensure that all parsed values are within
> * the outer range of valid values for the field. Individual fields may
> * be further processed for strictness.
> * <p>
> * For example, resolving year-month and day-of-month in the ISO calendar
> * system using strict mode will ensure that the day-of-month is valid
> * for the year-month, rejecting invalid values.
> */
> STRICT,
> /**
> * Style to resolve dates and times in a smart, or intelligent, manner.
> * <p>
> * Using smart resolution will perform the sensible default for each
> * field, which may be the same as strict, the same as lenient, or a third
> * behavior. Individual fields will interpret this differently.
> * <p>
> * For example, resolving year-month and day-of-month in the ISO calendar
> * system using smart mode will ensure that the day-of-month is from
> * 1 to 31, converting any value beyond the last valid day-of-month to be
> * the last valid day-of-month.
> */
> SMART,{noformat}
>
>
> Therefore, we should set the resolverStyle to STRICT to reject invalid date
> values.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)