IgorBerman opened a new pull request, #14744: URL: https://github.com/apache/iceberg/pull/14744
When pruning nested structures (lists, maps, structs), the PruneColumns visitor was incorrectly returning the original unpruned field when the container's field ID was in the selectedIds set, even when child fields had been pruned. This fix ensures that: 1. In struct(): When a field is selected and has been pruned (field != originalField), use the pruned version instead of the original. 2. In list(): Check for pruned element first before checking if elementId is selected, ensuring nested pruning is applied. 3. In map(): Similarly check for pruned value before checking selected keys/values. 4. Add validatePrunedField() to verify pruned fields maintain compatibility with original fields (same name, ID, and repetition). This enables proper column pruning for deeply nested schemas like: list<struct<field1, nested_list: list<struct<a, b, c, d>>>> When projecting only field1 and nested_list[].a, b, the fix ensures fields c and d are properly pruned from the Parquet projection schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
