timsants opened a new pull request, #10721: URL: https://github.com/apache/pinot/pull/10721
**Context** There is a use case using Kafka. The time column was configured with a time format that was working as expected. But when using a minion task leveraging the SegmentProcessorFramework, a date format exception was thrown: ``` Caught exception while parsing simple date format: yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSSZ with value: 2023-03-07T21:18:05.072784135Z at org.apache.pinot.segment.spi.creator.name.NormalizedDateSegmentNameGenerator.getNormalizedDate(NormalizedDateSegmentNameGenerator.java:142) ``` Specifically, `SegmentColumnarIndexCreator` uses joda time DateFormatter for parsing time format into epoch millis (See [SegmentColumnarIndexCreator](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java#L446)). While any segment generation configured to use the `NormalizedDateSegmentNameGenerator` may run into an issue where segment generation may fail since it uses another date time format library, `java.text` `SimpleDateFormat`. This library seems to be stricter that the joda time library and if a time column values has literal ‘Z’ in it, the time format must have the `Z` in single quotes. E.g. "2023-01-01T12:00:00.111111111Z" needs a time format of "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS’Z’" otherwise it will fail. **Changes** This PR makes the NormalizedDateSegmentNameGenerator also use the joda date time formatter. Joda is no longer maintained but making the Pinot wide migration may be a larger effort at the moment. This PR aims to unify the time library until we are ready to make the switch to use java time. Instructions: 1. The PR has to be tagged with at least one of the following labels (*): 1. `feature` 2. `bugfix` 3. `performance` 4. `ui` 5. `backward-incompat` 6. `release-notes` (**) 2. Remove these instructions before publishing the PR. (*) Other labels to consider: - `testing` - `dependencies` - `docker` - `kubernetes` - `observability` - `security` - `code-style` - `extension-point` - `refactor` - `cleanup` (**) Use `release-notes` label for scenarios like: - New configuration options - Deprecation of configurations - Signature changes to public methods/interfaces - New plugins added or old plugins removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
