tmater opened a new pull request, #14588:
URL: https://github.com/apache/iceberg/pull/14588
## Summary
Adds variant type support to `ParquetTypeVisitor` and all its subclasses to
enable proper handling of Parquet variant logical types during schema
operations.
## Background
This issue surfaced when using `ParquetUtil.footerMetrics()`, which calls
`convertAndPrune()` on the Parquet schema. `TestVariantMetrics` uses
`writeParquet()` and calls `ParquetMetrics.metrics()` directly, which bypasses
the schema conversion path and didn't expose this gap. Without the `variant()`
method implementations, variant fields were being skipped during schema
conversion, which then caused an NPE in `TypeWithSchemaVisitor` when it tried
to process the variant field that was missing from the converted schema.
## Changes
- Add `variant(GroupType)` method to `ParquetTypeVisitor` base class
- Implement `variant()` in all `ParquetTypeVisitor` subclasses:
- `MessageTypeToType` - converts Parquet variant to Iceberg `VariantType`
- `ApplyNameMapping` - applies name mappings to variant fields
- `ParquetSchemaUtil.HasIds` - checks for field IDs in variant types
- `RemoveIds` - removes IDs from variant schemas
- Add test `testVariantTypeConversion()` in `TestParquetSchemaUtil`
## Testing
New test validates schema conversion from Parquet variant logical type to
Iceberg `VariantType`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]