GWphua commented on code in PR #18477: URL: https://github.com/apache/druid/pull/18477#discussion_r2321411252
########## processing/src/test/java/org/apache/druid/java/util/common/IntervalsTest.java: ########## @@ -79,6 +79,44 @@ public void testFindOverlappingInterval() ); } + @Test + public void testValidIntervalStrings() + { + final String[] intervalStringRepresentations = new String[]{ + // Tests that use does not fallback to Intervals.of() + // Zulu with millis + "2022-01-01T00:00:00.000Z/2022-01-02T00:00:00.000Z", + "2021-03-14T12:34:56.789Z/2021-03-15T12:34:56.789Z", + + // Offset with colon + "2022-01-01T00:00:00.000+05:30/2022-01-01T01:00:00.000+05:30", + "2022-01-01T07:00:00.000-07:00/2022-01-01T08:00:00.000-07:00", + + // Basic offset without colon + "2022-01-01T00:00:00.000+0530/2022-01-01T01:00:00.000+0530", + + // Tests that fallback to Intervals.of() + // Zulu without millis + "2022-01-01T00:00:00Z/2022-01-02T00:00:00Z", + // Date-only + "2022-01-01/2022-01-02", + // start/period + "2022-01-01T00:00:00.000Z/P1D", + "2022-01-01T12:00:00Z/PT6H", + "2022-01-01T00:00:00Z/P2DT3H4M5S", + // period/end + "P1D/2022-01-02T00:00:00.000Z", + "PT6H/2022-01-01T18:00:00Z", + "P2DT3H4M5S/2022-01-03T03:04:05Z" Review Comment: Hi @kfaraz I attempted to make `tryOptimizedIntervalDeserialization` more inclusive (e.g. Work on Date-only, and Without Millis case), but this requires changing the `FAST_ISO_UTC_FORMATTER` DateTimeFormatter to a more flexible formatter. This causes performance to resemble the un-optimized versions. These 2 formatters allow formats like `"2022-01-01/2022-01-02"` or `"2022-01-01T00:00:00Z/2022-01-02T00:00:00Z"`, but the performance is lacking... ``` private static final DateTimeFormatter FAST_ISO_UTC_FORMATTER = ISODateTimeFormat.dateOptionalTimeParser() .withOffsetParsed() .withChronology(ISOChronology.getInstanceUTC()); Benchmark (numValues) Mode Cnt Score Error Units JodaIntervalDeserializationBenchmark.deserializeLegacy 20000 avgt 2 21.243 ms/op JodaIntervalDeserializationBenchmark.deserializeLegacyFallback 20000 avgt 2 15.217 ms/op JodaIntervalDeserializationBenchmark.deserializeOptimized 20000 avgt 2 20.605 ms/op JodaIntervalDeserializationBenchmark.deserializeOptimizedFallback 20000 avgt 2 31.051 ms/op ``` ``` private static final DateTimeFormatter FAST_ISO_UTC_FORMATTER = ISODateTimeFormat.dateTimeParser().withChronology(ISOChronology.getInstanceUTC()); Benchmark (numValues) Mode Cnt Score Error Units JodaIntervalDeserializationBenchmark.deserializeLegacy 20000 avgt 3 21.502 ± 0.995 ms/op JodaIntervalDeserializationBenchmark.deserializeLegacyFallback 20000 avgt 3 15.365 ± 1.227 ms/op JodaIntervalDeserializationBenchmark.deserializeOptimized 20000 avgt 3 21.239 ± 1.543 ms/op JodaIntervalDeserializationBenchmark.deserializeOptimizedFallback 20000 avgt 3 31.526 ± 3.083 ms/op ``` I think we can simply take advantage of knowing how our data will look like, and optimize the parse process for the most frequently-used data format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org