Jackie-Jiang opened a new pull request, #18434:
URL: https://github.com/apache/pinot/pull/18434

   ## Summary
   
   Introduce `ArrowRecordExtractor` (extends `BaseRecordExtractor`) with 
schema-driven dispatch by `ArrowTypeID`; drop the bespoke 
`ArrowToGenericRowConverter`. The reader and decoder bind reader-scoped state 
once via `setReader(ArrowReader)`, which caches the dictionary map and 
pre-resolves the include list against the `VectorSchemaRoot`'s field vectors.
   
   Add `ArrowRecordExtractorConfig` with `extractRawTimeValues` — matches the 
Avro / Parquet flag; `Date` / `Time` / `Timestamp` surface as raw `int` / 
`long` in the schema's unit instead of the contract Java type.
   
   `ArrowMessageDecoder.decode` now branches on row count:
   - `0` → `null`
   - `1` → fields populated directly into the destination
   - `>1` → wrapped under `GenericRow.MULTIPLE_RECORDS_KEY`
   
   ### Bug fixes vs the prior converter
   
   - `DateDayVector` returns `Integer` (not `LocalDateTime`); the old code cast 
unconditionally to `LocalDateTime` and would throw at runtime for `DateDay` 
columns.
   - `UInt2Vector` returns `Character` (not a `Number`); the old code passed it 
through unchanged, violating the `Int(16) → Integer` contract.
   - `UInt1Vector` was sign-extended (`200 → -56`) instead of zero-extended.
   - All three are now schema-aware (dispatch on `ArrowType.Int.getIsSigned()` 
/ `ArrowType.Date.getUnit()`).
   
   ### Tests
   
   - New `ArrowRecordExtractorTest` covering every Arrow vector type, raw and 
contract modes, complex types (`List`, `Struct`, `Map`), dictionary encoding, 
and include-list filtering. Each test runs through a real `ArrowStreamWriter` → 
`ArrowStreamReader` IPC roundtrip so `setReader` is exercised against an actual 
`ArrowReader` (no mocks).
   - `ArrowMessageDecoderTest` slimmed to decoder-specific concerns (lifecycle, 
error handling, empty / single / multi-row batch shapes).
   - `ArrowRecordReaderTest` keeps the inherited `AbstractRecordReaderTest` 
round-trip; redundant filter test removed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to