kfaraz commented on code in PR #18477:
URL: https://github.com/apache/druid/pull/18477#discussion_r2325372696
##########
processing/src/main/java/org/apache/druid/java/util/common/Intervals.java:
##########
@@ -53,6 +57,74 @@ public static Interval of(String format, Object...
formatArgs)
return of(StringUtils.format(format, formatArgs));
}
+ /**
+ * A performance-optimized method for parsing a Joda-Time {@link Interval}
from a string.
+ * This method is significantly faster than the standard {@link
Intervals#of(String)} for the following
+ * group of offsets:
+ * <ol>
+ * <li>"2022-01-01T00:00:00.000Z/2022-01-02T00:00:00.000Z"</li>
+ * <li>"2022-01-01T00:00:00.000+05:30/2022-01-01T01:00:00.000+05:30"</li>
+ * <li>"2022-01-01T00:00:00.000+0530/2022-01-01T01:00:00.000+0530"</li>
+ * </ol>
+ * <p>
+ * If the input string does not match the format, it will fall back to the
more flexible but
+ * slower {@link Intervals#of(String)} parser. If you are dealing with any
Intervals format examples below,
+ * consider using {@link Intervals#of(String)} instead:
+ * <ol>
+ * <li>"2022-01-01T00:00:00Z/2022-01-02T00:00:00Z" (without millis)</li>
+ * <li>"2022-01-01/2022-01-02" (Date only)</li>
+ * <li>"2022-01-01T12:00:00.000Z/PT6H" (Periods in start / end)</li>
+ * </ol>
+ *
+ * Currently, this method is only used in {@link
org.apache.druid.timeline.SegmentId}.
+ * Should you find additional places where the Interval format is guaranteed
to be compatible with this method,
+ * feel free to open a PR.
Review Comment:
Nit: We can leave out these lines.
##########
processing/src/main/java/org/apache/druid/java/util/common/Intervals.java:
##########
@@ -25,13 +25,17 @@
import org.joda.time.DateTime;
import org.joda.time.Interval;
import org.joda.time.chrono.ISOChronology;
+import org.joda.time.format.DateTimeFormatter;
+import org.joda.time.format.ISODateTimeFormat;
import javax.annotation.Nullable;
public final class Intervals
{
public static final Interval ETERNITY = utc(JodaUtils.MIN_INSTANT,
JodaUtils.MAX_INSTANT);
public static final ImmutableList<Interval> ONLY_ETERNITY =
ImmutableList.of(ETERNITY);
+ private static final DateTimeFormatter FAST_ISO_UTC_FORMATTER =
Review Comment:
Maybe also check out the other two constant formatters in that class. See if
they can help improve performance of the new method for other date formats. But
this can be done in a follow up PR.
##########
processing/src/main/java/org/apache/druid/java/util/common/Intervals.java:
##########
@@ -25,13 +25,17 @@
import org.joda.time.DateTime;
import org.joda.time.Interval;
import org.joda.time.chrono.ISOChronology;
+import org.joda.time.format.DateTimeFormatter;
+import org.joda.time.format.ISODateTimeFormat;
import javax.annotation.Nullable;
public final class Intervals
{
public static final Interval ETERNITY = utc(JodaUtils.MIN_INSTANT,
JodaUtils.MAX_INSTANT);
public static final ImmutableList<Interval> ONLY_ETERNITY =
ImmutableList.of(ETERNITY);
+ private static final DateTimeFormatter FAST_ISO_UTC_FORMATTER =
Review Comment:
I think you can use `DateTimes.ISO_DATE_TIME` instead of this since it uses
the same formatter internally.
##########
processing/src/main/java/org/apache/druid/java/util/common/Intervals.java:
##########
@@ -53,6 +57,74 @@ public static Interval of(String format, Object...
formatArgs)
return of(StringUtils.format(format, formatArgs));
}
+ /**
+ * A performance-optimized method for parsing a Joda-Time {@link Interval}
from a string.
+ * This method is significantly faster than the standard {@link
Intervals#of(String)} for the following
+ * group of offsets:
+ * <ol>
+ * <li>"2022-01-01T00:00:00.000Z/2022-01-02T00:00:00.000Z"</li>
+ * <li>"2022-01-01T00:00:00.000+05:30/2022-01-01T01:00:00.000+05:30"</li>
+ * <li>"2022-01-01T00:00:00.000+0530/2022-01-01T01:00:00.000+0530"</li>
+ * </ol>
+ * <p>
+ * If the input string does not match the format, it will fall back to the
more flexible but
+ * slower {@link Intervals#of(String)} parser. If you are dealing with any
Intervals format examples below,
+ * consider using {@link Intervals#of(String)} instead:
+ * <ol>
+ * <li>"2022-01-01T00:00:00Z/2022-01-02T00:00:00Z" (without millis)</li>
+ * <li>"2022-01-01/2022-01-02" (Date only)</li>
+ * <li>"2022-01-01T12:00:00.000Z/PT6H" (Periods in start / end)</li>
+ * </ol>
+ *
+ * Currently, this method is only used in {@link
org.apache.druid.timeline.SegmentId}.
+ * Should you find additional places where the Interval format is guaranteed
to be compatible with this method,
+ * feel free to open a PR.
+ */
+ public static Interval fromString(String string)
+ {
+ Interval interval = null;
+ if (canDeserializeIntervalOptimallyFromString(string)) {
+ interval = tryOptimizedIntervalDeserialization(string);
+ }
+
+ if (interval == null) {
+ return Intervals.of(string);
+ } else {
+ return interval;
+ }
Review Comment:
Nit:
```suggestion
return interval == null ? Intervals.of(string) : interval;
```
##########
processing/src/main/java/org/apache/druid/timeline/DataSegment.java:
##########
@@ -196,6 +197,37 @@ public DataSegment(
@JsonProperty("size") long size,
@JacksonInject PruneSpecsHolder pruneSpecsHolder
)
+ {
+ this(
Review Comment:
Please add a comment here mentioning that we take `interval` input as String
so that we can deserialize it optimally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]