skywalker0618 opened a new pull request, #18809:
URL: https://github.com/apache/hudi/pull/18809
### Describe the issue this Pull Request addresses
Five PRs (#18552, #18567, #18636, #18700, #18701) already landed the Flink
2.1 Dremel-style nested Parquet reader rewrite for `hudi-flink1.18.x`. This PR
ports the same set of changes to `hudi-flink1.19.x` so both Flink-version
modules share the same read path. Tracking JIRA: FLINK-35702.
### Summary and Changelog
Almost the entire change is a verbatim copy of the corresponding files from
`hudi-flink1.18.x` into `hudi-flink1.19.x`. No semantic adjustments were
required because the affected classes only use stable Flink core APIs that are
identical across 1.18 and 1.19.
Detailed file mapping (mirrors the five upstream PRs as a single squashed
commit):
- New files copied from `hudi-flink1.18.x` (Dremel reader support code
introduced by #18567 / #18636 / #18700):
-
`cow/utils/{BooleanArrayList,IntArrayList,LongArrayList,NestedPositionUtil}.java`
-
`cow/vector/position/{CollectionPosition,LevelDelegation,RowPosition}.java`
-
`cow/vector/type/{ParquetField,ParquetGroupField,ParquetPrimitiveField}.java`
- `cow/vector/reader/{NestedColumnReader,NestedPrimitiveColumnReader}.java`
- Existing files overwritten with the `hudi-flink1.18.x` version (decimal
fix from #18552 plus the wire-up from #18700):
- `cow/ParquetSplitReaderUtil.java`
-
`cow/vector/{HeapArrayVector,HeapMapColumnVector,HeapRowColumnVector,ParquetDecimalVector}.java`
-
`cow/vector/reader/{ParquetColumnarRowSplitReader,ParquetDataColumnReaderFactory}.java`
- Legacy readers deleted (cleanup from #18701, superseded by the new Dremel
path):
-
`cow/vector/{ColumnarGroupArrayData,ColumnarGroupMapData,ColumnarGroupRowData,HeapArrayGroupColumnVector}.java`
-
`cow/vector/reader/{ArrayColumnReader,ArrayGroupReader,MapColumnReader,RowColumnReader}.java`
- New tests copied verbatim from `hudi-flink1.18.x`:
- `TestParquetDecimalVector` (12 tests)
- `TestHeapColumnVectorAccessors` (4 tests)
- `TestParquetDataColumnReaderFactory` (13 tests)
- `TestParquetGroupField` (7 tests)
After the port, the only residual differences between `hudi-flink1.18.x` and
`hudi-flink1.19.x` are the pre-existing Flink-version adapter shims
(`MaskingOutputAdapter`, `SupportsPreWriteTopologyAdapter`, `Utils`, test
`CollectOutputAdapter`, test `MockTaskInfo`), which are unrelated to this
change and reflect legitimate Flink 1.18-vs-1.19 API differences.
### Impact
No public API changes and no user-facing behavior changes for users of
`hudi-flink1.19.x`. The internal Parquet read path now matches
`hudi-flink1.18.x`, picking up the same correctness fix for small-precision
decimals (previously a `ClassCastException` for `INT32`/`INT64`-encoded
decimals) and the same nested-schema read support via Flink 2.1's Dremel-style
column readers.
### Risk Level
Low. The change is a verbatim port of code that has already landed and
stabilized in `hudi-flink1.18.x`. Validation performed in this branch:
- `mvn -pl hudi-flink-datasource/hudi-flink1.19.x -am -Pflink1.19
test-compile` succeeds.
- All 36 unit tests in the ported test classes pass under the `flink1.19`
profile (TestParquetDecimalVector 12/12, TestHeapColumnVectorAccessors 4/4,
TestParquetDataColumnReaderFactory 13/13, TestParquetGroupField 7/7).
- Static checks: no remaining references to the deleted legacy readers
anywhere in `hudi-flink1.19.x/` or the shared `hudi-flink/` module; no imports
introduced that depend on Flink-1.18-only APIs.
### Documentation Update
none
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]