rdblue commented on issue #2962: URL: https://github.com/apache/iceberg/issues/2962#issuecomment-946104928
Thanks to @kbendick providing a table that reproduces the problem, I've been able to debug to find out what's going on here. The issue is in column pruning. When Iceberg prunes the fields in the file schema using the expected Iceberg schema, it can return a Parquet schema that doesn't match the file anymore because of the field rename. Iceberg uses Parquet's `Types.map` builder to produce a [new map type in `PruneColumns`](https://github.com/apache/iceberg/blob/0.11.x/parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java#L133-L136). But that was changed in https://github.com/apache/parquet-mr/pull/798 to produce a key/value records named `key_value` instead of `map`. So rebuilding the type using Parquet's helpers actually produces a type with different names. Then Iceberg passes the new schema into Parquet as the projection and that causes the map column to be dropped because there is no `mapCol.key_value` structure, instead there is a `mapCol.map`. The reason why this sometimes works in 0.12.0 is that the value check changed to `equals` instead of identity (`==`), so if you project the whole value the original map is returned. You can reproduce the issue in 0.12.0 by selecting a projection of the map value rather than `*`. For example: ```sql SELECT mapCol.value.str FROM repro_table ``` The solution is to rebuild the map structure to exactly match the incoming file schema rather than relying on Parquet to produce the same thing across versions. I'll open a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
