Copilot commented on code in PR #18428:
URL: https://github.com/apache/pinot/pull/18428#discussion_r3191233919
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
}
}
- public static PinotDataType getSingleValueType(Class<?> cls) {
- if (cls == Integer.class) {
+ /// Returns the [PinotDataType] for the given single value, dispatched on
the runtime class via
+ /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of
non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ public static PinotDataType getSingleValueType(Object value) {
Review Comment:
Replacing the public `Class<?>` entry points with value-only signatures is a
source/binary incompatible API change. Downstream code compiled against
`getSingleValueType(Class<?>)` / `getMultiValueType(Class<?>)` will no longer
link, and callers that only have a declared type lose the only supported path
for resolving types without materializing a sample instance. Keeping the old
overloads as deprecated shims would preserve compatibility.
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
}
}
- public static PinotDataType getSingleValueType(Class<?> cls) {
- if (cls == Integer.class) {
+ /// Returns the [PinotDataType] for the given single value, dispatched on
the runtime class via
+ /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of
non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ public static PinotDataType getSingleValueType(Object value) {
+ if (value instanceof Integer) {
return INTEGER;
}
- if (cls == Long.class) {
+ if (value instanceof Long) {
return LONG;
}
- if (cls == Float.class) {
+ if (value instanceof Float) {
return FLOAT;
}
- if (cls == Double.class) {
+ if (value instanceof Double) {
return DOUBLE;
}
- if (cls == BigDecimal.class) {
+ if (value instanceof BigDecimal) {
return BIG_DECIMAL;
}
- if (cls == String.class) {
- return STRING;
- }
- if (cls == byte[].class) {
- return BYTES;
- }
- if (cls == UUID.class) {
- return UUID;
- }
- if (cls == Boolean.class) {
+ if (value instanceof Boolean) {
return BOOLEAN;
}
- if (cls == Timestamp.class) {
+ if (value instanceof Timestamp) {
return TIMESTAMP;
}
- if (cls != null && Map.class.isAssignableFrom(cls)) {
+ if (value instanceof String) {
+ return STRING;
+ }
+ if (value instanceof byte[]) {
+ return BYTES;
+ }
+ if (value instanceof Map) {
return MAP;
}
- if (cls == LocalDate.class) {
+ if (value instanceof LocalDate) {
return DATE;
}
- if (cls == LocalTime.class) {
+ if (value instanceof LocalTime) {
return TIME;
}
- if (cls == Byte.class) {
+ if (value instanceof UUID) {
+ return UUID;
+ }
+ if (value instanceof Byte) {
return BYTE;
}
- if (cls == Character.class) {
+ if (value instanceof Character) {
return CHARACTER;
}
- if (cls == Short.class) {
+ if (value instanceof Short) {
return SHORT;
}
return OBJECT;
}
- public static PinotDataType getMultiValueType(Class<?> cls) {
- if (cls == Integer.class) {
+ /// Returns the multi-value [PinotDataType] for the given sample element,
dispatched on the runtime class
+ /// via `instanceof`. Returns [#OBJECT_ARRAY] for any unrecognized type.
+ public static PinotDataType getMultiValueType(Object element) {
Review Comment:
Replacing the public `Class<?>` entry points with value-only signatures is a
source/binary incompatible API change. Downstream code compiled against
`getSingleValueType(Class<?>)` / `getMultiValueType(Class<?>)` will no longer
link, and callers that only have a declared type lose the only supported path
for resolving types without materializing a sample instance. Keeping the old
overloads as deprecated shims would preserve compatibility.
##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?>
clazz) {
return PARAMETER_TYPE_MAP.get(clazz);
}
- /**
- * Returns the corresponding PinotDataType for the given argument class, or
{@code null} if there is no one matching.
- */
- @Nullable
- public static PinotDataType getArgumentType(Class<?> clazz) {
- if (Collection.class.isAssignableFrom(clazz)) {
- return PinotDataType.COLLECTION;
+ /// Returns the corresponding [PinotDataType] for the given argument value
(the actual value passed into
+ /// the function). Returns [PinotDataType#OBJECT] /
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+ /// matching [PinotDataType#getSingleValueType]'s best-effort fallback.
Subclasses of non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ ///
+ /// Dispatch (single-value first since it's the dominant case for function
arguments):
+ /// - Single values → delegated to [PinotDataType#getSingleValueType]
(covers all scalar types
+ /// including `byte[]` → [PinotDataType#BYTES]).
+ /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) →
first non-null element is
+ /// sampled and [PinotDataType#getMultiValueType] is consulted. Empty /
all-null reference arrays
+ /// fall back to [PinotDataType#OBJECT_ARRAY] since the element type is
undeterminable.
+ /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` /
`boolean[]`) → handled here, since
+ /// they can't be element-sampled into a boxed type.
+ /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back
to [PinotDataType#OBJECT].
+ public static PinotDataType getArgumentType(Object value) {
+ PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+ if (singleValueType != PinotDataType.OBJECT) {
+ return singleValueType;
}
Review Comment:
This replaces the public `getArgumentType(Class<?>)` API with a value-based
variant, and the same PR also removes `getDataType(Class<?>)`. That is a
breaking change for downstream code that performs type resolution from declared
classes or method signatures rather than live values. Please keep the
`Class<?>` APIs as compatibility shims and deprecate them before removal.
##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?>
clazz) {
return PARAMETER_TYPE_MAP.get(clazz);
}
- /**
- * Returns the corresponding PinotDataType for the given argument class, or
{@code null} if there is no one matching.
- */
- @Nullable
- public static PinotDataType getArgumentType(Class<?> clazz) {
- if (Collection.class.isAssignableFrom(clazz)) {
- return PinotDataType.COLLECTION;
+ /// Returns the corresponding [PinotDataType] for the given argument value
(the actual value passed into
+ /// the function). Returns [PinotDataType#OBJECT] /
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+ /// matching [PinotDataType#getSingleValueType]'s best-effort fallback.
Subclasses of non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ ///
+ /// Dispatch (single-value first since it's the dominant case for function
arguments):
+ /// - Single values → delegated to [PinotDataType#getSingleValueType]
(covers all scalar types
+ /// including `byte[]` → [PinotDataType#BYTES]).
+ /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) →
first non-null element is
+ /// sampled and [PinotDataType#getMultiValueType] is consulted. Empty /
all-null reference arrays
+ /// fall back to [PinotDataType#OBJECT_ARRAY] since the element type is
undeterminable.
+ /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` /
`boolean[]`) → handled here, since
+ /// they can't be element-sampled into a boxed type.
+ /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back
to [PinotDataType#OBJECT].
+ public static PinotDataType getArgumentType(Object value) {
+ PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+ if (singleValueType != PinotDataType.OBJECT) {
+ return singleValueType;
}
- if (Map.class.isAssignableFrom(clazz)) {
- return PinotDataType.MAP;
+ if (value instanceof Object[]) {
+ Object[] array = (Object[]) value;
+ for (Object element : array) {
+ if (element == null) {
+ continue;
+ }
+ return PinotDataType.getMultiValueType(element);
+ }
+ // Empty or all-null reference array — element type undeterminable.
+ return PinotDataType.OBJECT_ARRAY;
}
Review Comment:
This regresses typed empty/all-null reference arrays. `new Integer[0]`, `new
Timestamp[0]`, `new String[0]`, etc. now resolve to `OBJECT_ARRAY` even though
the runtime array class still exposes the component type. Callers such as
`BaseDefaultColumnHandler.createDerivedColumnV1Indices` infer column type from
the first produced value, so an early empty array can permanently lock the
derived column to `OBJECT_ARRAY` and break later writes/conversions. Fall back
to the array component type before returning `OBJECT_ARRAY`.
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
}
}
- public static PinotDataType getSingleValueType(Class<?> cls) {
- if (cls == Integer.class) {
+ /// Returns the [PinotDataType] for the given single value, dispatched on
the runtime class via
+ /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of
non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ public static PinotDataType getSingleValueType(Object value) {
+ if (value instanceof Integer) {
return INTEGER;
}
- if (cls == Long.class) {
+ if (value instanceof Long) {
return LONG;
}
- if (cls == Float.class) {
+ if (value instanceof Float) {
return FLOAT;
}
- if (cls == Double.class) {
+ if (value instanceof Double) {
return DOUBLE;
}
- if (cls == BigDecimal.class) {
+ if (value instanceof BigDecimal) {
return BIG_DECIMAL;
}
- if (cls == String.class) {
- return STRING;
- }
- if (cls == byte[].class) {
- return BYTES;
- }
- if (cls == UUID.class) {
- return UUID;
- }
- if (cls == Boolean.class) {
+ if (value instanceof Boolean) {
return BOOLEAN;
}
- if (cls == Timestamp.class) {
+ if (value instanceof Timestamp) {
return TIMESTAMP;
}
- if (cls != null && Map.class.isAssignableFrom(cls)) {
+ if (value instanceof String) {
+ return STRING;
+ }
+ if (value instanceof byte[]) {
+ return BYTES;
+ }
+ if (value instanceof Map) {
return MAP;
}
- if (cls == LocalDate.class) {
+ if (value instanceof LocalDate) {
return DATE;
}
- if (cls == LocalTime.class) {
+ if (value instanceof LocalTime) {
return TIME;
}
- if (cls == Byte.class) {
+ if (value instanceof UUID) {
+ return UUID;
+ }
+ if (value instanceof Byte) {
return BYTE;
}
- if (cls == Character.class) {
+ if (value instanceof Character) {
return CHARACTER;
}
- if (cls == Short.class) {
+ if (value instanceof Short) {
return SHORT;
}
return OBJECT;
}
- public static PinotDataType getMultiValueType(Class<?> cls) {
- if (cls == Integer.class) {
+ /// Returns the multi-value [PinotDataType] for the given sample element,
dispatched on the runtime class
+ /// via `instanceof`. Returns [#OBJECT_ARRAY] for any unrecognized type.
+ public static PinotDataType getMultiValueType(Object element) {
Review Comment:
The previous tests covered `null`, but the updated suite no longer asserts
the `null` fallbacks for these new value-based overloads. Since the new
implementation depends on `instanceof` semantics to return `OBJECT` /
`OBJECT_ARRAY` for `null`, please add explicit `getSingleValueType(null)` and
`getMultiValueType(null)` assertions so this contract is locked down.
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1230,6 +1230,24 @@ public Integer[] toInternal(Object value) {
}
},
+ BOOLEAN_ARRAY {
+ @Override
+ public Boolean[] convert(Object value, PinotDataType sourceType) {
+ return sourceType.toBooleanArray(value);
+ }
+
+ @Override
+ public Integer[] toInternal(Object value) {
+ Boolean[] booleanArray = (Boolean[]) value;
+ int length = booleanArray.length;
+ Integer[] intArray = new Integer[length];
+ for (int i = 0; i < length; i++) {
+ intArray[i] = booleanArray[i] != null ? (booleanArray[i] ? 1 : 0) :
null;
+ }
Review Comment:
This repurposes the existing public `BOOLEAN_ARRAY` enum constant from
carrying `boolean[]` to carrying `Boolean[]`. Any existing caller that still
passes primitive `boolean[]` values under `BOOLEAN_ARRAY` will now fail at the
cast on line 1241, so this is a runtime-breaking contract change rather than a
purely internal refactor. The old `BOOLEAN_ARRAY` behavior should remain
backward-compatible if you need to add a boxed variant.
##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?>
clazz) {
return PARAMETER_TYPE_MAP.get(clazz);
}
- /**
- * Returns the corresponding PinotDataType for the given argument class, or
{@code null} if there is no one matching.
- */
- @Nullable
- public static PinotDataType getArgumentType(Class<?> clazz) {
- if (Collection.class.isAssignableFrom(clazz)) {
- return PinotDataType.COLLECTION;
+ /// Returns the corresponding [PinotDataType] for the given argument value
(the actual value passed into
+ /// the function). Returns [PinotDataType#OBJECT] /
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+ /// matching [PinotDataType#getSingleValueType]'s best-effort fallback.
Subclasses of non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ ///
+ /// Dispatch (single-value first since it's the dominant case for function
arguments):
+ /// - Single values → delegated to [PinotDataType#getSingleValueType]
(covers all scalar types
+ /// including `byte[]` → [PinotDataType#BYTES]).
+ /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) →
first non-null element is
+ /// sampled and [PinotDataType#getMultiValueType] is consulted. Empty /
all-null reference arrays
+ /// fall back to [PinotDataType#OBJECT_ARRAY] since the element type is
undeterminable.
+ /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` /
`boolean[]`) → handled here, since
+ /// they can't be element-sampled into a boxed type.
+ /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back
to [PinotDataType#OBJECT].
+ public static PinotDataType getArgumentType(Object value) {
+ PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+ if (singleValueType != PinotDataType.OBJECT) {
+ return singleValueType;
}
Review Comment:
The old `Class<?>` helper accepted `null`, and this replacement still has a
`null` path through `getSingleValueType(value)`. `FunctionUtilsTest` doesn't
cover `getArgumentType(null)`, so a future refactor could accidentally change
the fallback from `OBJECT` without tripping the new suite.
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
}
}
- public static PinotDataType getSingleValueType(Class<?> cls) {
- if (cls == Integer.class) {
+ /// Returns the [PinotDataType] for the given single value, dispatched on
the runtime class via
+ /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of
non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ public static PinotDataType getSingleValueType(Object value) {
Review Comment:
The previous tests covered `null`, but the updated suite no longer asserts
the `null` fallbacks for these new value-based overloads. Since the new
implementation depends on `instanceof` semantics to return `OBJECT` /
`OBJECT_ARRAY` for `null`, please add explicit `getSingleValueType(null)` and
`getMultiValueType(null)` assertions so this contract is locked down.
##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?>
clazz) {
return PARAMETER_TYPE_MAP.get(clazz);
}
- /**
- * Returns the corresponding PinotDataType for the given argument class, or
{@code null} if there is no one matching.
- */
- @Nullable
- public static PinotDataType getArgumentType(Class<?> clazz) {
- if (Collection.class.isAssignableFrom(clazz)) {
- return PinotDataType.COLLECTION;
+ /// Returns the corresponding [PinotDataType] for the given argument value
(the actual value passed into
+ /// the function). Returns [PinotDataType#OBJECT] /
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+ /// matching [PinotDataType#getSingleValueType]'s best-effort fallback.
Subclasses of non-final types
+ /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are
matched by their parent type.
+ ///
+ /// Dispatch (single-value first since it's the dominant case for function
arguments):
+ /// - Single values → delegated to [PinotDataType#getSingleValueType]
(covers all scalar types
+ /// including `byte[]` → [PinotDataType#BYTES]).
+ /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) →
first non-null element is
+ /// sampled and [PinotDataType#getMultiValueType] is consulted. Empty /
all-null reference arrays
+ /// fall back to [PinotDataType#OBJECT_ARRAY] since the element type is
undeterminable.
+ /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` /
`boolean[]`) → handled here, since
+ /// they can't be element-sampled into a boxed type.
+ /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back
to [PinotDataType#OBJECT].
+ public static PinotDataType getArgumentType(Object value) {
+ PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+ if (singleValueType != PinotDataType.OBJECT) {
+ return singleValueType;
}
- if (Map.class.isAssignableFrom(clazz)) {
- return PinotDataType.MAP;
+ if (value instanceof Object[]) {
+ Object[] array = (Object[]) value;
+ for (Object element : array) {
+ if (element == null) {
+ continue;
+ }
+ return PinotDataType.getMultiValueType(element);
+ }
+ // Empty or all-null reference array — element type undeterminable.
+ return PinotDataType.OBJECT_ARRAY;
}
- return ARGUMENT_TYPE_MAP.get(clazz);
- }
-
- /**
- * Returns the corresponding DataType for the given class, or {@code null}
if there is no one matching.
- */
- @Nullable
- public static DataType getDataType(Class<?> clazz) {
- return DATA_TYPE_MAP.get(clazz);
+ if (value instanceof int[]) {
+ return PinotDataType.PRIMITIVE_INT_ARRAY;
+ }
+ if (value instanceof long[]) {
+ return PinotDataType.PRIMITIVE_LONG_ARRAY;
+ }
+ if (value instanceof float[]) {
+ return PinotDataType.PRIMITIVE_FLOAT_ARRAY;
+ }
+ if (value instanceof double[]) {
+ return PinotDataType.PRIMITIVE_DOUBLE_ARRAY;
+ }
+ if (value instanceof boolean[]) {
+ return PinotDataType.PRIMITIVE_BOOLEAN_ARRAY;
+ }
+ if (value instanceof Collection) {
+ return PinotDataType.COLLECTION;
+ }
+ return PinotDataType.OBJECT;
Review Comment:
The old `Class<?>` helper accepted `null`, and this replacement still has a
`null` path through `getSingleValueType(value)`. `FunctionUtilsTest` doesn't
cover `getArgumentType(null)`, so a future refactor could accidentally change
the fallback from `OBJECT` without tripping the new suite.
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1731,19 +1743,42 @@ public UUID[] toUuidArray(Object value) {
}
}
+ private static Object[] toObjectArray(Object array) {
+ if (array instanceof Collection) {
+ return ((Collection<?>) array).toArray();
+ }
+ Class<?> componentType = array.getClass().getComponentType();
+ if (componentType.isPrimitive()) {
+ if (componentType == int.class) {
+ return ArrayUtils.toObject((int[]) array);
+ }
+ if (componentType == long.class) {
+ return ArrayUtils.toObject((long[]) array);
+ }
+ if (componentType == float.class) {
+ return ArrayUtils.toObject((float[]) array);
+ }
+ if (componentType == double.class) {
+ return ArrayUtils.toObject((double[]) array);
+ }
+ if (componentType == boolean.class) {
+ return ArrayUtils.toObject((boolean[]) array);
+ }
+ throw new UnsupportedOperationException("Unsupported primitive array
type: " + componentType);
+ } else {
+ return (Object[]) array;
+ }
+ }
+
public Object convert(Object value, PinotDataType sourceType) {
throw new UnsupportedOperationException("Cannot convert value from " +
sourceType + " to " + this);
}
- /**
- * Converts to the internal representation of the value.
- * <ul>
- * <li>BOOLEAN -> int</li>
- * <li>TIMESTAMP -> long</li>
- * <li>BOOLEAN_ARRAY -> int[]</li>
- * <li>TIMESTAMP_ARRAY -> long[]</li>
- * </ul>
- */
+ /// Converts to the internal representation of the value.
+ /// - `BOOLEAN` → `Integer` (0/1)
+ /// - `TIMESTAMP` → `Long` (epoch millis)
+ /// - `PRIMITIVE_BOOLEAN_ARRAY` / `BOOLEAN_ARRAY` → `Integer[]` (per-element
0/1)
+ /// - `TIMESTAMP_ARRAY` → `Long[]` (per-element epoch millis)
Review Comment:
This was previously Javadoc, but `///` is only a normal line comment in
Java. That means the public `toInternal` contract drops out of generated
Javadocs and IDE quick docs right when this PR changes the boolean-array
behavior. Please keep this as real Javadoc (`/** ... */`) so the updated API
contract remains discoverable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]