rdblue commented on issue #2962:
URL: https://github.com/apache/iceberg/issues/2962#issuecomment-946104928


   Thanks to @kbendick providing a table that reproduces the problem, I've been 
able to debug to find out what's going on here. The issue is in column pruning. 
When Iceberg prunes the fields in the file schema using the expected Iceberg 
schema, it can return a Parquet schema that doesn't match the file anymore 
because of the field rename.
   
   Iceberg uses Parquet's `Types.map` builder to produce a [new map type in 
`PruneColumns`](https://github.com/apache/iceberg/blob/0.11.x/parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java#L133-L136).
 But that was changed in https://github.com/apache/parquet-mr/pull/798 to 
produce a key/value records named `key_value` instead of `map`. So rebuilding 
the type using Parquet's helpers actually produces a type with different names. 
Then Iceberg passes the new schema into Parquet as the projection and that 
causes the map column to be dropped because there is no `mapCol.key_value` 
structure, instead there is a `mapCol.map`.
   
   The reason why this sometimes works in 0.12.0 is that the value check 
changed to `equals` instead of identity (`==`), so if you project the whole 
value the original map is returned. You can reproduce the issue in 0.12.0 by 
selecting a projection of the map value rather than `*`. For example:
   
   ```sql
   SELECT mapCol.value.str FROM repro_table
   ```
   
   The solution is to rebuild the map structure to exactly match the incoming 
file schema rather than relying on Parquet to produce the same thing across 
versions. I'll open a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to