RussellSpitzer edited a comment on pull request #2877: URL: https://github.com/apache/iceberg/pull/2877#issuecomment-887947654
Ok so trying to fix this from the Source side, the issue here for Entries table is although it reports a schema of ``` status, snapshot_id, sequence_number, data_file <Struct with 15 fields> ``` The manifest reader is allowed to project within data file which means the actual GenericManifestFiles it creates have a schema of ``` status, snapshot_id, sequence_number, data_file < pruned columns> ``` This means the table schema as set in the read tasks is incorrect and does not match what is actually in the read data. Creating GenericManfiestFile with projection of data file column https://github.com/apache/iceberg/blob/83ebd4ed57254822ca26ef9b7a5ea6f528da8b34/core/src/main/java/org/apache/iceberg/ManifestEntriesTable.java#L141-L142 Creating Spark StructInternalRow representation using incorrect schema (full table schema not projected schema used in GenericManfiestFile) https://github.com/apache/iceberg/blob/c69da8a8c1c2f99de3a1b826514775f0f07bde72/spark/src/main/java/org/apache/iceberg/spark/source/RowDataReader.java#L189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
