timsants opened a new pull request, #10721:
URL: https://github.com/apache/pinot/pull/10721

   **Context**
   
   There is a use case using Kafka. The time column was configured with a time 
format that was working as expected. But when using a minion task leveraging 
the SegmentProcessorFramework, a date format exception was thrown: 
   ```
   Caught exception while parsing simple date format: 
yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSSZ with value: 2023-03-07T21:18:05.072784135Z at 
org.apache.pinot.segment.spi.creator.name.NormalizedDateSegmentNameGenerator.getNormalizedDate(NormalizedDateSegmentNameGenerator.java:142)
   ```
   
   Specifically, `SegmentColumnarIndexCreator` uses joda time DateFormatter for 
parsing time format into epoch millis (See 
[SegmentColumnarIndexCreator](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java#L446)).
 
   
   While any segment generation configured to use the 
`NormalizedDateSegmentNameGenerator` may run into an issue where segment 
generation may fail since it uses another date time format library, `java.text` 
`SimpleDateFormat`. This library seems to be stricter that the joda time 
library and if a time column values has literal ‘Z’ in it, the time format must 
have the `Z` in single quotes. E.g. "2023-01-01T12:00:00.111111111Z" needs a 
time format of "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS’Z’" otherwise it will fail.
   
   **Changes**
   
   This PR makes the NormalizedDateSegmentNameGenerator also use the joda date 
time formatter. Joda is no longer maintained but making the Pinot wide 
migration may be a larger effort at the moment. This PR aims to unify the time 
library until we are ready to make the switch to use java time.
   
   Instructions:
   1. The PR has to be tagged with at least one of the following labels (*):
      1. `feature`
      2. `bugfix`
      3. `performance`
      4. `ui`
      5. `backward-incompat`
      6. `release-notes` (**)
   2. Remove these instructions before publishing the PR.
    
   (*) Other labels to consider:
   - `testing`
   - `dependencies`
   - `docker`
   - `kubernetes`
   - `observability`
   - `security`
   - `code-style`
   - `extension-point`
   - `refactor`
   - `cleanup`
   
   (**) Use `release-notes` label for scenarios like:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to