RussellSpitzer edited a comment on issue #2783: URL: https://github.com/apache/iceberg/issues/2783#issuecomment-886952345
So I think I tracked this down, the basic issue is that Spark 3.1 correctly prunes nested structs and Spark 3.0 does not. You may wonder, if Spark3.1 correctly prunes nested structs why is this an issue? The issue is that we end up reading only 2 fields out of our metadata tables and correctly present them. But our create UnsafeProjection code assumes that if a nested struct is read, then all fields are read so we end up building a projection which requires all columns, rather than just the ones we have actually extracted. This means we build a broken projection. See RowDataReader projection, which only does a top-level pruning. https://github.com/apache/iceberg/blob/a79de571860a290f6e96ac562d616c9c6be2071e/spark/src/main/java/org/apache/iceberg/spark/source/RowDataReader.java#L208-L211 If we never prune columns out of the struct this is fine, if we do then we have a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
