Copilot commented on code in PR #18436:
URL: https://github.com/apache/pinot/pull/18436#discussion_r3203263693


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/BaseRecordExtractor.java:
##########
@@ -18,46 +18,16 @@
  */
 package org.apache.pinot.spi.data.readers;
 
-import com.google.common.collect.Maps;
-import java.math.BigDecimal;
-import java.math.BigInteger;
-import java.nio.ByteBuffer;
 import java.sql.Timestamp;
 import java.util.Base64;
-import java.util.Collection;
-import java.util.Map;
 import java.util.Set;
 import javax.annotation.Nullable;
 import org.apache.commons.collections4.CollectionUtils;
-import org.apache.commons.lang3.ArrayUtils;
 
 
-/// Default [RecordExtractor] implementation. Subclasses override only the 
bits the format needs.
-///
-/// [#convert] dispatches by shape:
-/// - multi-value (`Collection` / non-`byte[]` array) → [#convertMultiValue] → 
`Object[]` (each element recursed)
-/// - map → [#convertMap] → `Map<String, Object>` (keys converted via 
[#convertSingleValue] then stringified via
-///   [#stringifyMapKey], each value recursed)
-/// - nested record → [#convertRecord] (throws by default; override for 
formats with nested records)
-/// - everything else → [#convertSingleValue]
-///
-/// [#convertSingleValue] applies universal normalizations for primitive types 
that any format extractor might
-/// produce:
-/// - `Byte` / `Short` widen to `Integer` so all small ints unify behind a 
single Pinot type
-/// - `BigInteger` widens to `BigDecimal` (Pinot has no `BigInteger` data 
type; downstream transforms handle
-///   `BigDecimal` natively)
-/// - other `Number` (`Integer` / `Long` / `Float` / `Double` / `BigDecimal`) 
passes through
-/// - `Boolean` passes through
-/// - `byte[]` passes through
-/// - `ByteBuffer` materializes to `byte[]` (sliced so the source buffer's 
position is not advanced)
-/// - everything else falls back to `value.toString()`
-///
-/// **Logical types (DECIMAL / TIMESTAMP / DATE / TIME / UUID) are NOT handled 
here** — see [RecordExtractor]
-/// for the contract. Format-specific extractors do the native-to-contract 
conversion themselves (e.g. the
-/// Avro extractor walks the schema in its own `extract` and never reaches 
this dispatcher).
+/// Default [RecordExtractor] base providing include-list resolution via 
[#init] and the [#stringifyMapKey] helper.
 ///
 /// @param <T> the format of the input record
-@SuppressWarnings({"rawtypes", "unchecked"})
 public abstract class BaseRecordExtractor<T> implements RecordExtractor<T> {
 

Review Comment:
   Removing the protected conversion dispatcher/helpers from 
BaseRecordExtractor is a source/binary incompatible change for any external 
RecordExtractor implementations that extend this SPI class and relied on 
convert()/convertMap()/convertSingleValue(), etc. Since this class lives in 
pinot-spi (published for external consumers), consider keeping the old methods 
as `@Deprecated` (possibly moved to a utility and delegated to) for at least 
one release cycle, or otherwise providing a compatibility layer/migration path.



##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetNativeRecordExtractor.java:
##########
@@ -51,32 +50,22 @@
 /// - `BINARY` with `STRING` / `ENUM` annotation → `String`
 /// - `BINARY` / `FIXED_LEN_BYTE_ARRAY` without annotation → `byte[]`
 ///
-/// **Logical types** — Parquet logical-type annotations decoded into uniform 
Java types regardless of the
-/// underlying physical encoding:
-/// - `DECIMAL` (on `INT32` / `INT64` / `BINARY` / `FIXED_LEN_BYTE_ARRAY`) → 
`BigDecimal`. Always converted —
-///   raw bytes aren't interpretable without external precision / scale
-/// - `INT64` + `TIMESTAMP_MILLIS` / `MICROS` / `NANOS` → `Timestamp` 
(sub-millisecond nanos preserved via
-///   `Timestamp#setNanos`), or `Long` raw value in the column's declared unit 
when raw
-/// - `INT96` → `Timestamp` (sub-millisecond nanos preserved), or `Long` epoch 
nanos when raw (nanos is
-///   INT96's natural unit since its physical encoding — nanos-of-day + Julian 
day — carries nanosecond
-///   precision)
-/// - `INT32` + `DATE` → `LocalDate`, or `Integer` raw days-since-epoch when 
raw
-/// - `INT32` + `TIME_MILLIS` → `LocalTime`, or `Integer` raw 
ms-since-midnight when raw
-/// - `INT64` + `TIME_MICROS` / `TIME_NANOS` → `LocalTime` (full nanosecond 
precision), or `Long` raw
-///   value-since-midnight in the column's declared unit when raw
-/// - `FIXED_LEN_BYTE_ARRAY(16)` + `UUID` → `java.util.UUID`. Always 
converted; the downstream type
-///   transformer adapts to the Pinot column's storage type — `STRING` → 
canonical form, `BYTES` →
-///   16-byte big-endian form
+/// **Logical types:**
+/// - `DECIMAL` (on `INT32` / `INT64` / `BINARY` / `FIXED_LEN_BYTE_ARRAY`) → 
`BigDecimal`
+/// - `INT64` + `TIMESTAMP_MILLIS` / `MICROS` / `NANOS` → `Timestamp`, or 
`Long` in the column's declared unit
+///   when `extractRawTimeValues` is `true`
+/// - `INT96` → `Timestamp`, or `Long` epoch nanos when `extractRawTimeValues` 
is `true`
+/// - `INT32` + `DATE` → `LocalDate`, or `Integer` days-since-epoch when 
`extractRawTimeValues` is `true`
+/// - `INT32` + `TIME_MILLIS` → `LocalTime`, or `Integer` ms-since-midnight 
when `extractRawTimeValues` is `true`
+/// - `INT64` + `TIME_MICROS` / `TIME_NANOS` → `LocalTime`, or `Long` 
value-since-midnight in the column's
+///   declared unit when `extractRawTimeValues` is `true`
+/// - `FIXED_LEN_BYTE_ARRAY(16)` + `UUID` → `java.util.UUID`
 ///
-/// [ParquetNativeRecordExtractorConfig#isExtractRawTimeValues] opts out of 
TIMESTAMP / DATE / TIME conversion.
-/// DECIMAL and UUID always convert.
-///
-/// **Multi-value:** `LIST`-annotated group (standard 3-level wrapper or 
legacy non-wrapper forms) → `Object[]`.
-///
-/// **Map / nested complex:** `MAP`-annotated group → `Map<Object, Object>`; 
plain non-annotated group →
-/// `Map<String, Object>` (struct).
-///
-/// **Null:** field with zero repetition count → `null`.
+/// **Complex types:**
+/// - `LIST`-annotated group (standard 3-level wrapper or legacy non-wrapper 
forms) → `Object[]`
+/// - `MAP`-annotated group → `Map<Object, Object>`

Review Comment:
   The class-level type matrix says MAP-annotated groups convert to 
`Map<Object, Object>`, but the implementation builds a `Map<String, Object>` 
and stringifies keys via stringifyMapKey(). Please update the documentation to 
reflect the actual contract/behavior (Map<String, Object>).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to