[PR] feat(flink): Backport Flink 2.1 Dremel nested Parquet reader rewrite to hudi-flink1.19.x (FLINK-35702) [hudi]

via GitHub Thu, 21 May 2026 14:41:26 -0700


skywalker0618 opened a new pull request, #18809:
URL: https://github.com/apache/hudi/pull/18809


   ### Describe the issue this Pull Request addresses
   
   Five PRs (#18552, #18567, #18636, #18700, #18701) already landed the Flink 
2.1 Dremel-style nested Parquet reader rewrite for `hudi-flink1.18.x`. This PR 
ports the same set of changes to `hudi-flink1.19.x` so both Flink-version 
modules share the same read path. Tracking JIRA: FLINK-35702.
   
   ### Summary and Changelog
   
   Almost the entire change is a verbatim copy of the corresponding files from 
`hudi-flink1.18.x` into `hudi-flink1.19.x`. No semantic adjustments were 
required because the affected classes only use stable Flink core APIs that are 
identical across 1.18 and 1.19.
   
   Detailed file mapping (mirrors the five upstream PRs as a single squashed 
commit):
   
   - New files copied from `hudi-flink1.18.x` (Dremel reader support code 
introduced by #18567 / #18636 / #18700):
     - 
`cow/utils/{BooleanArrayList,IntArrayList,LongArrayList,NestedPositionUtil}.java`
     - 
`cow/vector/position/{CollectionPosition,LevelDelegation,RowPosition}.java`
     - 
`cow/vector/type/{ParquetField,ParquetGroupField,ParquetPrimitiveField}.java`
     - `cow/vector/reader/{NestedColumnReader,NestedPrimitiveColumnReader}.java`
   - Existing files overwritten with the `hudi-flink1.18.x` version (decimal 
fix from #18552 plus the wire-up from #18700):
     - `cow/ParquetSplitReaderUtil.java`
     - 
`cow/vector/{HeapArrayVector,HeapMapColumnVector,HeapRowColumnVector,ParquetDecimalVector}.java`
     - 
`cow/vector/reader/{ParquetColumnarRowSplitReader,ParquetDataColumnReaderFactory}.java`
   - Legacy readers deleted (cleanup from #18701, superseded by the new Dremel 
path):
     - 
`cow/vector/{ColumnarGroupArrayData,ColumnarGroupMapData,ColumnarGroupRowData,HeapArrayGroupColumnVector}.java`
     - 
`cow/vector/reader/{ArrayColumnReader,ArrayGroupReader,MapColumnReader,RowColumnReader}.java`
   - New tests copied verbatim from `hudi-flink1.18.x`:
     - `TestParquetDecimalVector` (12 tests)
     - `TestHeapColumnVectorAccessors` (4 tests)
     - `TestParquetDataColumnReaderFactory` (13 tests)
     - `TestParquetGroupField` (7 tests)
   
   After the port, the only residual differences between `hudi-flink1.18.x` and 
`hudi-flink1.19.x` are the pre-existing Flink-version adapter shims 
(`MaskingOutputAdapter`, `SupportsPreWriteTopologyAdapter`, `Utils`, test 
`CollectOutputAdapter`, test `MockTaskInfo`), which are unrelated to this 
change and reflect legitimate Flink 1.18-vs-1.19 API differences.
   
   ### Impact
   
   No public API changes and no user-facing behavior changes for users of 
`hudi-flink1.19.x`. The internal Parquet read path now matches 
`hudi-flink1.18.x`, picking up the same correctness fix for small-precision 
decimals (previously a `ClassCastException` for `INT32`/`INT64`-encoded 
decimals) and the same nested-schema read support via Flink 2.1's Dremel-style 
column readers.
   
   ### Risk Level
   
   Low. The change is a verbatim port of code that has already landed and 
stabilized in `hudi-flink1.18.x`. Validation performed in this branch:
   
   - `mvn -pl hudi-flink-datasource/hudi-flink1.19.x -am -Pflink1.19 
test-compile` succeeds.
   - All 36 unit tests in the ported test classes pass under the `flink1.19` 
profile (TestParquetDecimalVector 12/12, TestHeapColumnVectorAccessors 4/4, 
TestParquetDataColumnReaderFactory 13/13, TestParquetGroupField 7/7).
   - Static checks: no remaining references to the deleted legacy readers 
anywhere in `hudi-flink1.19.x/` or the shared `hudi-flink/` module; no imports 
introduced that depend on Flink-1.18-only APIs.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(flink): Backport Flink 2.1 Dremel nested Parquet reader rewrite to hudi-flink1.19.x (FLINK-35702) [hudi]

Reply via email to