zabetak opened a new pull request, #4675:
URL: https://github.com/apache/hive/pull/4675

   ### What changes were proposed in this pull request?
   1. Generalize the UnixTimeFormatter hierarchy so that it works with Instant, 
which can represent more than epochSeconds, to be used by GenericUDFDateFormat 
class.
   2. Adapt slightly other classes using the UnixTimeFormatter to pass or 
retrieve the necessary values from/to Instants.
   3. Refactor GenericUDFDateFormat to use the InstantFormatter so that its 
behavior becomes configurable via `hive.datetime.formatter` property.
   4. Extend the use of `hive.datetime.formatter` to date_format and update the 
description in HiveConf mentioning also the bugs that affect SIMPLE formatter.
   5. Add unit tests for date_format for both formatters covering:
   * Reference dates (1970-01-01) and trivial patterns
   * Invalid date inputs (Jul 9 2023) that cannot be parsed
   * Timestamp inputs with timezone specification (1970-01-01 00:00:00 
GMT-01:00) that is ignored/dropped silently
   * Current date (2023-07-21 09:13:10.123456789) with nano precision and 
pattern variations
   * Patterns ('u','SSSSSSSSS') with different behavior between formatters
   * Gregorian dates (1800-01-01 00:00:00) before 1900 and different timezones
   * Julian dates (1000-01-01 00:00:00) and different timezones
   
   Essentially the tests also demonstrate existing bugs when the SIMPLE 
formatter is used affecting Gregorian dates before 1900 and Julian dates.
   
   ### Why are the changes needed?
   SimpleDateFormat and DateTimeFormatter present differences in their behavior 
leading to different query results when date_format is used (after upgrade).
   
   To avoid sudden changes in query results and allow users to migrate 
gradually their query workloads to use the new patterns the 
`hive.datetime.formatter` property can now be used to select which formatter 
will be used in date_format.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, allows to configure the behavior of `date_format` function via 
`hive.datetime.formatter` property.
   
   ### Is the change a dependency upgrade?
   No
   
   ### How was this patch tested?
   ```
   mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile_regex=.*(date|timestamp).*
   mvn test -pl ql -Dtest=TestGenericUDF*Date*
   mvn test -pl ql -Dtest=TestGenericUDF*Timestamp*
   mvn test -pl ql -Dtest=TestGenericUDF*Unix*
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to