[GitHub] [iceberg] RussellSpitzer edited a comment on pull request #2877: Spark: Fix nested struct pruning

GitBox Tue, 27 Jul 2021 18:52:57 -0700


RussellSpitzer edited a comment on pull request #2877:
URL: https://github.com/apache/iceberg/pull/2877#issuecomment-887947654



   Ok so trying to fix this from the Source side, the issue here for Entries 
table is although it reports a schema of
   ```
   status, snapshot_id, sequence_number, data_file <Struct with 15 fields>
   ```
   
   The manifest reader is allowed to project within data file which means the 
actual GenericManifestFiles it creates have a schema of
   ```
   status, snapshot_id, sequence_number, data_file < pruned columns>
   ```
   
   This means the table schema as set in the read tasks is incorrect and does 
not match what is actually in the read data.
   
   Creating GenericManfiestFile with projection of data file column
   
   
https://github.com/apache/iceberg/blob/83ebd4ed57254822ca26ef9b7a5ea6f528da8b34/core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java#L141-L142
   
   Creating Spark StructInternalRow representation using incorrect schema (full 
table schema not projected schema used in GenericManfiestFile)
   
   
https://github.com/apache/iceberg/blob/c69da8a8c1c2f99de3a1b826514775f0f07bde72/spark/src/main/java/org/apache/iceberg/spark/source/RowDataReader.java#L189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer edited a comment on pull request #2877: Spark: Fix nested struct pruning

Reply via email to