Jackie-Jiang opened a new pull request, #18428: URL: https://github.com/apache/pinot/pull/18428
## Summary Replaces the `Class`-based map / chain dispatch in `PinotDataType.getSingleValueType` / `getMultiValueType` and `FunctionUtils.getArgumentType` with an `instanceof` chain that takes the value directly. Fixes a long-standing exact-class match bug where vendor JDBC `Timestamp` subclasses (e.g. BigQuery Simba's `TimestampTz`) fell through to `OBJECT` and broke downstream conversion. ## PinotDataType - **`getSingleValueType(Class<?>)` → `getSingleValueType(Object)`** and **`getMultiValueType(Class<?>)` → `getMultiValueType(Object)`** — `instanceof` dispatch in canonical Pinot type order. Always non-null (`OBJECT` / `OBJECT_ARRAY` for unrecognized types). Subclasses of non-final types (`Timestamp`, `Map`, etc.) match their parent type naturally. - **Split `BOOLEAN_ARRAY` into `PRIMITIVE_BOOLEAN_ARRAY` (`boolean[]`) and `BOOLEAN_ARRAY` (`Boolean[]`)** — parallel to `PRIMITIVE_INT_ARRAY` / `INTEGER_ARRAY`. Fixes the silent asymmetry where `BOOLEAN_ARRAY` stored as primitive while every other `*_ARRAY` was boxed. - **Rename `toBooleanArray` (returns `boolean[]`) to `toPrimitiveBooleanArray`**; new `toBooleanArray` returns `Boolean[]`. Matches `int` / `long` / `float` / `double` naming. - **`toObjectArray` now handles `boolean[]`** alongside `int[]` / `long[]` / `float[]` / `double[]`. - **Reorder default `to*Array` methods** to canonical Pinot type order (`INT → LONG → FLOAT → DOUBLE → BIG_DECIMAL → BOOLEAN → TIMESTAMP → STRING → BYTES → DATE → TIME → UUID`). ## FunctionUtils - **`getArgumentType(Class<?>)` → `getArgumentType(Object)`**, always non-null. Delegates SV dispatch to `PinotDataType.getSingleValueType` and MV reference-array dispatch to `PinotDataType.getMultiValueType` via element sampling; primitive arrays handled locally (since they can't be element-sampled into a boxed type). - **Add `boolean[]` / `Timestamp[]` entries** to `PARAMETER_TYPE_MAP` and `COLUMN_DATA_TYPE_MAP` so scalar functions can declare these as parameter / return types. - **Remove unused `DATA_TYPE_MAP` and `getDataType`** — zero callers; the map mapped Java array classes to the element-type `DataType` which lost the SV/MV distinction. ## Caller updates Drops `.getClass()` at every call site: - `FunctionInvoker.convertTypes` - `BaseDefaultColumnHandler.createDerivedColumnV1Indices` - `DataTypeConversionFunctions.cast` - `MapColumnPreIndexStatsCollector.createKeyStatsCollector` - `DataTypeTransformerUtils.transformValue` ## Tests - New `FunctionUtilsTest` covering `getArgumentType` / `getParameterType` / `getColumnDataType`, including the vendor `Timestamp` subclass case and the new `boolean[]` / `Timestamp[]` map entries. - `PinotDataTypeTest` converted to value-based assertions in canonical order, added `PRIMITIVE_BOOLEAN_ARRAY` ↔ `BOOLEAN_ARRAY` cross-form conversions and `Timestamp` subclass cases for both `getSingleValueType` and `getMultiValueType`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
