zabetak opened a new pull request, #4675: URL: https://github.com/apache/hive/pull/4675
### What changes were proposed in this pull request? 1. Generalize the UnixTimeFormatter hierarchy so that it works with Instant, which can represent more than epochSeconds, to be used by GenericUDFDateFormat class. 2. Adapt slightly other classes using the UnixTimeFormatter to pass or retrieve the necessary values from/to Instants. 3. Refactor GenericUDFDateFormat to use the InstantFormatter so that its behavior becomes configurable via `hive.datetime.formatter` property. 4. Extend the use of `hive.datetime.formatter` to date_format and update the description in HiveConf mentioning also the bugs that affect SIMPLE formatter. 5. Add unit tests for date_format for both formatters covering: * Reference dates (1970-01-01) and trivial patterns * Invalid date inputs (Jul 9 2023) that cannot be parsed * Timestamp inputs with timezone specification (1970-01-01 00:00:00 GMT-01:00) that is ignored/dropped silently * Current date (2023-07-21 09:13:10.123456789) with nano precision and pattern variations * Patterns ('u','SSSSSSSSS') with different behavior between formatters * Gregorian dates (1800-01-01 00:00:00) before 1900 and different timezones * Julian dates (1000-01-01 00:00:00) and different timezones Essentially the tests also demonstrate existing bugs when the SIMPLE formatter is used affecting Gregorian dates before 1900 and Julian dates. ### Why are the changes needed? SimpleDateFormat and DateTimeFormatter present differences in their behavior leading to different query results when date_format is used (after upgrade). To avoid sudden changes in query results and allow users to migrate gradually their query workloads to use the new patterns the `hive.datetime.formatter` property can now be used to select which formatter will be used in date_format. ### Does this PR introduce _any_ user-facing change? Yes, allows to configure the behavior of `date_format` function via `hive.datetime.formatter` property. ### Is the change a dependency upgrade? No ### How was this patch tested? ``` mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapLocalCliDriver -Dqfile_regex=.*(date|timestamp).* mvn test -pl ql -Dtest=TestGenericUDF*Date* mvn test -pl ql -Dtest=TestGenericUDF*Timestamp* mvn test -pl ql -Dtest=TestGenericUDF*Unix* ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org