Jackie-Jiang commented on PR #18428:
URL: https://github.com/apache/pinot/pull/18428#issuecomment-4383420609

   Summarizing replies to Copilot's review:
   
   ## Re: `Class<?>` API removal (3 comments on `getSingleValueType` / 
`getMultiValueType` / `getArgumentType`)
   
   Deliberate. The old `Class<?>`-based API was the source of the bug this PR 
fixes — exact-class matching against `Timestamp.class` silently miscategorized 
vendor JDBC subclasses (e.g. BigQuery Simba's `TimestampTz`) as `OBJECT`, which 
broke downstream conversion. A `Class<?>` shim that delegates to the new 
value-based form would either need to materialize a sentinel instance per 
`Class<?>` (impossible without per-class registration) or perpetuate the 
original bug. Going through `value.getClass()` at the existing call sites is 
the cleaner contract — every internal caller already had a value in hand and 
was paying `.getClass()` boilerplate to feed the Class-based API. The four call 
sites in OSS (`FunctionInvoker`, `BaseDefaultColumnHandler`, 
`DataTypeConversionFunctions`, `MapColumnPreIndexStatsCollector`, 
`DataTypeTransformerUtils`) all pass values now, none lost functionality. Same 
reasoning for removing `getDataType` — it was unused and conceptually broken 
(mapped Java *arra
 y* classes to the *element-type* `DataType`, losing the SV/MV distinction that 
`FieldSpec` tracks separately).
   
   ## Re: `BOOLEAN_ARRAY` repurposing
   
   Deliberate. Pre-PR, `BOOLEAN_ARRAY` used `boolean[]` storage while every 
*other* `*_ARRAY` type used the boxed form (`INTEGER_ARRAY`/`LONG_ARRAY`/etc.); 
the parallel `PRIMITIVE_INT_ARRAY`/`PRIMITIVE_LONG_ARRAY` already existed. This 
PR splits the boolean form to match: `PRIMITIVE_BOOLEAN_ARRAY` carries 
`boolean[]` (parallel to `PRIMITIVE_INT_ARRAY`), `BOOLEAN_ARRAY` carries 
`Boolean[]` (parallel to `INTEGER_ARRAY`). The naming is now uniform across all 
numeric types. Pinot's actual ingestion path goes through 
`DataTypeTransformer.transform` → `BOOLEAN_ARRAY.convert(value, sourceType)` → 
`BOOLEAN_ARRAY.toInternal(...)`, and the new `convert` produces `Boolean[]` 
matching the new `toInternal` cast — so the chain stays internally consistent. 
Direct external use of `PinotDataType.BOOLEAN_ARRAY.toInternal(boolean[])` is 
rare to nonexistent in the codebase.
   
   ## Re: empty/all-null typed reference arrays in `getArgumentType`
   
   The collapse to `OBJECT_ARRAY` for empty/all-null arrays is harmless. The 
cited concern about `BaseDefaultColumnHandler.createDerivedColumnV1Indices` 
doesn't materialize: `outputValueType` is only used to pick which `to*Array` 
method to call, and every `to*Array` default impl starts with an `instanceof` 
short-circuit on the *destination* boxed-array type:
   
   ```java
   public Integer[] toIntegerArray(Object value) {
     if (value instanceof Integer[]) return (Integer[]) value;
     ...
   }
   ```
   
   So if a later row produces a typed `Integer[]`, 
`OBJECT_ARRAY.toIntegerArray(Integer[])` (the inherited default) returns it 
as-is regardless of the cached `outputValueType`. Mixed-type arrays are handled 
by the per-element `catch CCE → anyToInt(element)` fallback inside the loop. 
The dispatch label and the conversion result are decoupled here — the label 
only steers method selection.
   
   ## Re: `///` Javadoc on `toInternal`
   
   `///` is JEP 467 Markdown Javadoc — proper Javadoc, not a line comment, 
since Java 23. Pinot already uses this convention widely. Tooling (javadoc, 
IDEs, IntelliJ) renders it.
   
   ## Re: `null` fallback test coverage (3 comments)
   
   The API contract is non-null after this PR (note: `@Nullable` was dropped on 
the value parameter). The fact that `instanceof null` returns false and the 
chain falls through to `OBJECT`/`OBJECT_ARRAY` is an incidental implementation 
detail, not a documented contract. Adding `getSingleValueType(null)` / 
`getMultiValueType(null)` / `getArgumentType(null)` assertions would lock down 
behavior we don't intend to guarantee — callers with a `null` value should 
pre-check, not rely on `OBJECT` fallthrough.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to