Jackie-Jiang opened a new pull request, #18428:
URL: https://github.com/apache/pinot/pull/18428

   ## Summary
   
   Replaces the `Class`-based map / chain dispatch in 
`PinotDataType.getSingleValueType` / `getMultiValueType` and 
`FunctionUtils.getArgumentType` with an `instanceof` chain that takes the value 
directly. Fixes a long-standing exact-class match bug where vendor JDBC 
`Timestamp` subclasses (e.g. BigQuery Simba's `TimestampTz`) fell through to 
`OBJECT` and broke downstream conversion.
   
   ## PinotDataType
   
   - **`getSingleValueType(Class<?>)` → `getSingleValueType(Object)`** and 
**`getMultiValueType(Class<?>)` → `getMultiValueType(Object)`** — `instanceof` 
dispatch in canonical Pinot type order. Always non-null (`OBJECT` / 
`OBJECT_ARRAY` for unrecognized types). Subclasses of non-final types 
(`Timestamp`, `Map`, etc.) match their parent type naturally.
   - **Split `BOOLEAN_ARRAY` into `PRIMITIVE_BOOLEAN_ARRAY` (`boolean[]`) and 
`BOOLEAN_ARRAY` (`Boolean[]`)** — parallel to `PRIMITIVE_INT_ARRAY` / 
`INTEGER_ARRAY`. Fixes the silent asymmetry where `BOOLEAN_ARRAY` stored as 
primitive while every other `*_ARRAY` was boxed.
   - **Rename `toBooleanArray` (returns `boolean[]`) to 
`toPrimitiveBooleanArray`**; new `toBooleanArray` returns `Boolean[]`. Matches 
`int` / `long` / `float` / `double` naming.
   - **`toObjectArray` now handles `boolean[]`** alongside `int[]` / `long[]` / 
`float[]` / `double[]`.
   - **Reorder default `to*Array` methods** to canonical Pinot type order (`INT 
→ LONG → FLOAT → DOUBLE → BIG_DECIMAL → BOOLEAN → TIMESTAMP → STRING → BYTES → 
DATE → TIME → UUID`).
   
   ## FunctionUtils
   
   - **`getArgumentType(Class<?>)` → `getArgumentType(Object)`**, always 
non-null. Delegates SV dispatch to `PinotDataType.getSingleValueType` and MV 
reference-array dispatch to `PinotDataType.getMultiValueType` via element 
sampling; primitive arrays handled locally (since they can't be element-sampled 
into a boxed type).
   - **Add `boolean[]` / `Timestamp[]` entries** to `PARAMETER_TYPE_MAP` and 
`COLUMN_DATA_TYPE_MAP` so scalar functions can declare these as parameter / 
return types.
   - **Remove unused `DATA_TYPE_MAP` and `getDataType`** — zero callers; the 
map mapped Java array classes to the element-type `DataType` which lost the 
SV/MV distinction.
   
   ## Caller updates
   
   Drops `.getClass()` at every call site:
   - `FunctionInvoker.convertTypes`
   - `BaseDefaultColumnHandler.createDerivedColumnV1Indices`
   - `DataTypeConversionFunctions.cast`
   - `MapColumnPreIndexStatsCollector.createKeyStatsCollector`
   - `DataTypeTransformerUtils.transformValue`
   
   ## Tests
   
   - New `FunctionUtilsTest` covering `getArgumentType` / `getParameterType` / 
`getColumnDataType`, including the vendor `Timestamp` subclass case and the new 
`boolean[]` / `Timestamp[]` map entries.
   - `PinotDataTypeTest` converted to value-based assertions in canonical 
order, added `PRIMITIVE_BOOLEAN_ARRAY` ↔ `BOOLEAN_ARRAY` cross-form conversions 
and `Timestamp` subclass cases for both `getSingleValueType` and 
`getMultiValueType`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to