[GitHub] [iceberg] RussellSpitzer edited a comment on issue #2783: Metadata Table Empty Projection -Unknown type for int field. Type name: java.lang.String

GitBox Mon, 26 Jul 2021 12:06:28 -0700


RussellSpitzer edited a comment on issue #2783:
URL: https://github.com/apache/iceberg/issues/2783#issuecomment-886952345



   So I think I tracked this down, the basic issue is that Spark 3.1 correctly 
prunes nested structs and Spark 3.0 does not. You may wonder, if Spark3.1 
correctly prunes nested structs why is this an issue?
   
   The issue is that we end up reading only 2 fields out of our metadata tables 
and correctly present them. But our create UnsafeProjection code assumes that 
if a nested struct is read, then all fields are read so we end up building a 
projection which requires all columns, rather than just the ones we have 
actually extracted. This means we build a broken projection.
   
   
   
   See RowDataReader projection, which only does a top-level pruning.
   
https://github.com/apache/iceberg/blob/a79de571860a290f6e96ac562d616c9c6be2071e/spark/src/main/java/org/apache/iceberg/spark/source/RowDataReader.java#L208-L211
   
   If we never prune columns out of the struct this is fine, if we do then we 
have a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer edited a comment on issue #2783: Metadata Table Empty Projection -Unknown type for int field. Type name: java.lang.String

Reply via email to