Jackie-Jiang commented on code in PR #14298:
URL: https://github.com/apache/pinot/pull/14298#discussion_r1819836993
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/transformer/datetime/BaseDateTimeTransformer.java:
##########
@@ -42,14 +46,26 @@ public abstract class BaseDateTimeTransformer<I, O>
implements DataTransformer<I
private final DateTimeFormatter _outputDateTimeFormatter;
private final DateTimeGranularitySpec _outputGranularity;
private final long _outputGranularityMillis;
- private final SDFDateTimeTruncate _dateTimeTruncate;
+ private final SDFDateTimeTruncate _sdfDateTimeTruncate;
+ private final MillisDateTimeTruncate _millisDateTimeTruncate;
+ private final Chronology _bucketingChronology;
+ private final MutableDateTime _dateTime;
+ private final StringBuilder _printBuffer;
private interface SDFDateTimeTruncate {
- String truncate(DateTime dateTime);
+ String truncate(MutableDateTime dateTime);
}
- public BaseDateTimeTransformer(@Nonnull DateTimeFormatSpec inputFormat,
@Nonnull DateTimeFormatSpec outputFormat,
- @Nonnull DateTimeGranularitySpec outputGranularity) {
+ private interface MillisDateTimeTruncate {
+ long truncate(MutableDateTime dateTime);
+ }
+
+ public BaseDateTimeTransformer(
+ @Nonnull DateTimeFormatSpec inputFormat,
Review Comment:
(Convention) We usually only annotate `Nullable` and treat others as
`Nonnull`
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/transformer/datetime/EpochToEpochTransformer.java:
##########
@@ -29,14 +31,20 @@
public class EpochToEpochTransformer extends BaseDateTimeTransformer<long[],
long[]> {
public EpochToEpochTransformer(DateTimeFormatSpec inputFormat,
DateTimeFormatSpec outputFormat,
- DateTimeGranularitySpec outputGranularity) {
- super(inputFormat, outputFormat, outputGranularity);
+ DateTimeGranularitySpec outputGranularity, @Nullable DateTimeZone
bucketingTz) {
+ super(inputFormat, outputFormat, outputGranularity, bucketingTz);
}
@Override
public void transform(@Nonnull long[] input, @Nonnull long[] output, int
length) {
- for (int i = 0; i < length; i++) {
- output[i] =
transformMillisToEpoch(transformToOutputGranularity(transformEpochToMillis(input[i])));
+ if (useCustomBucketingTimeZone()) {
Review Comment:
We can still short-circuit it if it is UTC. Same for other places.
I guess we can check if the explicitly specified time zone is the same as
output format time zone, and count it as custom time zone only when they are
different.
##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/DateTimeConvert.java:
##########
@@ -42,36 +51,43 @@ public Object dateTimeConvert(String timeValueStr, String
inputFormatStr, String
_inputFormatSpec = new DateTimeFormatSpec(inputFormatStr);
_outputFormatSpec = new DateTimeFormatSpec(outputFormatStr);
_granularitySpec = new DateTimeGranularitySpec(outputGranularityStr);
+ _dateTime = new MutableDateTime(0L, DateTimeZone.UTC);
+ _buffer = new StringBuilder();
Review Comment:
Consider adding some javadoc explaining we reuse date time and string
builder to reduce garbage
##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/DateTimeConvert.java:
##########
@@ -83,4 +99,67 @@ public Object dateTimeConvert(String timeValueStr, String
inputFormatStr, String
return _outputFormatSpec.fromMillisToFormat(roundedTimeValueMs);
}
}
+
+ @ScalarFunction
+ public Object dateTimeConvert(String timeValueStr, String inputFormatStr,
String outputFormatStr,
Review Comment:
It is almost the same as the other method. Can we introduce a helper method
with nullable `bucketingTimeZone`, and make both the scalar function call the
helper method
##########
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java:
##########
@@ -115,8 +115,14 @@ public enum TransformFunctionType {
// Date time functions
TIME_CONVERT("timeConvert", ReturnTypes.BIGINT,
OperandTypes.family(List.of(SqlTypeFamily.ANY, SqlTypeFamily.CHARACTER,
SqlTypeFamily.CHARACTER))),
- DATE_TIME_CONVERT("dateTimeConvert",
TransformFunctionType::dateTimeConverterReturnTypeInference,
OperandTypes.family(
- List.of(SqlTypeFamily.ANY, SqlTypeFamily.CHARACTER,
SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER))),
+ DATE_TIME_CONVERT("dateTimeConvert",
TransformFunctionType::dateTimeConverterReturnTypeInference,
+ OperandTypes.or(
Review Comment:
We can use optional predicate instead. See `DATE_TRUNC` as an example
##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/DateTimeConvert.java:
##########
@@ -42,36 +51,43 @@ public Object dateTimeConvert(String timeValueStr, String
inputFormatStr, String
_inputFormatSpec = new DateTimeFormatSpec(inputFormatStr);
_outputFormatSpec = new DateTimeFormatSpec(outputFormatStr);
_granularitySpec = new DateTimeGranularitySpec(outputGranularityStr);
+ _dateTime = new MutableDateTime(0L, DateTimeZone.UTC);
+ _buffer = new StringBuilder();
}
long timeValueMs = _inputFormatSpec.fromFormatToMillis(timeValueStr);
if (_outputFormatSpec.getTimeFormat() ==
DateTimeFieldSpec.TimeFormat.SIMPLE_DATE_FORMAT) {
DateTimeFormatter outputFormatter =
_outputFormatSpec.getDateTimeFormatter();
- DateTime dateTime = new DateTime(timeValueMs, outputFormatter.getZone());
+ _dateTime.setMillis(timeValueMs);
+ _dateTime.setZone(outputFormatter.getZone());
Review Comment:
We can fix the time zone during the initialization as it won't change. Same
for other places where we update timezone per value
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/transformer/datetime/BaseDateTimeTransformer.java:
##########
@@ -58,41 +74,145 @@ public BaseDateTimeTransformer(@Nonnull DateTimeFormatSpec
inputFormat, @Nonnull
_outputDateTimeFormatter = outputFormat.getDateTimeFormatter();
_outputGranularity = outputGranularity;
_outputGranularityMillis = outputGranularity.granularityToMillis();
+ _bucketingChronology = bucketingTimeZone != null ?
ISOChronology.getInstance(bucketingTimeZone) : null;
+ _dateTime = new MutableDateTime(0L, DateTimeZone.UTC);
+ _printBuffer = new StringBuilder();
+
+ final int size = _outputGranularity.getSize();
// setup date time truncating based on output granularity
- final int sz = _outputGranularity.getSize();
+ //when size == 1, skip the needless set() calls
Review Comment:
(nit) Put a space after `//` for consistency
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/transformer/datetime/BaseDateTimeTransformer.java:
##########
@@ -58,41 +74,145 @@ public BaseDateTimeTransformer(@Nonnull DateTimeFormatSpec
inputFormat, @Nonnull
_outputDateTimeFormatter = outputFormat.getDateTimeFormatter();
_outputGranularity = outputGranularity;
_outputGranularityMillis = outputGranularity.granularityToMillis();
+ _bucketingChronology = bucketingTimeZone != null ?
ISOChronology.getInstance(bucketingTimeZone) : null;
+ _dateTime = new MutableDateTime(0L, DateTimeZone.UTC);
+ _printBuffer = new StringBuilder();
+
+ final int size = _outputGranularity.getSize();
// setup date time truncating based on output granularity
- final int sz = _outputGranularity.getSize();
+ //when size == 1, skip the needless set() calls
switch (_outputGranularity.getTimeUnit()) {
case MILLISECONDS:
- _dateTimeTruncate = (dateTime) -> _outputDateTimeFormatter
- .print(dateTime.withMillisOfSecond((dateTime.getMillisOfSecond() /
sz) * sz));
+ if (size != 1) {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.setMillisOfSecond((dateTime.getMillisOfSecond() / size) *
size);
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.setMillisOfSecond((dateTime.getMillisOfSecond() / size) *
size);
+ return dateTime.getMillis();
+ };
+ } else {
+ _sdfDateTimeTruncate = (dateTime) -> print(dateTime);
+ _millisDateTimeTruncate = (dateTime) -> dateTime.getMillis();
+ }
break;
case SECONDS:
- _dateTimeTruncate = (dateTime) -> _outputDateTimeFormatter.print(
- dateTime.withSecondOfMinute((dateTime.getSecondOfMinute() / sz) *
sz).secondOfMinute().roundFloorCopy());
+ if (size != 1) {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.setSecondOfMinute((dateTime.getSecondOfMinute() / size) *
size);
+ dateTime.secondOfMinute().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.setSecondOfMinute((dateTime.getSecondOfMinute() / size) *
size);
+ dateTime.secondOfMinute().roundFloor();
+ return dateTime.getMillis();
+ };
+ } else {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.secondOfMinute().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.secondOfMinute().roundFloor();
+ return dateTime.getMillis();
+ };
+ }
break;
case MINUTES:
- _dateTimeTruncate = (dateTime) -> _outputDateTimeFormatter
- .print(dateTime.withMinuteOfHour((dateTime.getMinuteOfHour() / sz)
* sz).minuteOfHour().roundFloorCopy());
+ if (size != 1) {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.setMinuteOfHour((dateTime.getMinuteOfHour() / size) *
size);
+ dateTime.minuteOfHour().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.setMinuteOfHour((dateTime.getMinuteOfHour() / size) *
size);
+ dateTime.minuteOfHour().roundFloor();
+ return dateTime.getMillis();
+ };
+ } else {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.minuteOfHour().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.minuteOfHour().roundFloor();
+ return dateTime.getMillis();
+ };
+ }
break;
case HOURS:
- _dateTimeTruncate = (dateTime) -> _outputDateTimeFormatter
- .print(dateTime.withHourOfDay((dateTime.getHourOfDay() / sz) *
sz).hourOfDay().roundFloorCopy());
+ if (size != 1) {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.setHourOfDay((dateTime.getHourOfDay() / size) * size);
+ dateTime.hourOfDay().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.setHourOfDay((dateTime.getHourOfDay() / size) * size);
+ dateTime.hourOfDay().roundFloor();
+ return dateTime.getMillis();
+ };
+ } else {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.hourOfDay().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.hourOfDay().roundFloor();
+ return dateTime.getMillis();
+ };
+ }
break;
case DAYS:
- _dateTimeTruncate = (dateTime) -> _outputDateTimeFormatter
- .print(dateTime.withDayOfMonth(((dateTime.getDayOfMonth() - 1) /
sz) * sz + 1).dayOfMonth()
- .roundFloorCopy());
+ if (size != 1) {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.setDayOfMonth(((dateTime.getDayOfMonth() - 1) / size) *
size + 1);
+ dateTime.dayOfMonth().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.setDayOfMonth(((dateTime.getDayOfMonth() - 1) / size) *
size + 1);
+ dateTime.dayOfMonth().roundFloor();
+ return dateTime.getMillis();
+ };
+ } else {
+ _sdfDateTimeTruncate = (dateTime) -> {
+ dateTime.dayOfMonth().roundFloor();
+ return print(dateTime);
+ };
+ _millisDateTimeTruncate = (dateTime) -> {
+ dateTime.dayOfMonth().roundFloor();
+ return dateTime.getMillis();
+ };
+ }
break;
default:
- _dateTimeTruncate = _outputDateTimeFormatter::print;
+ _sdfDateTimeTruncate = _outputDateTimeFormatter::print;
Review Comment:
Should we throw exception here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]