RussellSpitzer commented on a change in pull request #1744:
URL: https://github.com/apache/iceberg/pull/1744#discussion_r520047139
##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntry.java
##########
@@ -54,7 +54,21 @@ static Schema getSchema(StructType partitionType) {
}
static Schema wrapFileSchema(StructType fileType) {
- return new Schema(STATUS, SNAPSHOT_ID, SEQUENCE_NUMBER,
required(DATA_FILE_ID, "data_file", fileType));
+ /*
+ * Used when determining the projection for ManifestReader, because the
projection for data_file can be empty
+ * we need to explicitly remove the column if it is empty. If we leave it
in the schema we end up pruning it out
+ * (because no columns within it are required) but still require it be
present in the projection. This causes
+ * an exception to be raised.
+ *
+ * To fix this we skip requesting the DATA_FILE column when nothing is
required from it.
+ *
+ * TODO Fix the projection in ManifestReader itself to not just work with
data_file columns
+ */
+ if (!fileType.fields().isEmpty()) {
+ return new Schema(STATUS, SNAPSHOT_ID, SEQUENCE_NUMBER,
required(DATA_FILE_ID, "data_file", fileType));
Review comment:
This is the first bug. This schema is used in the Manifest Reader
.project call, when the data_file file type requested is empty then the column
is ignored. But since it is there in the schema it is still required. This
didn't break unpartitioned tables because of the second bug below.
As mentioned in the comment, it would be better if we got the projection
schema from a combination of only the columns required, rather than always
assuming all 4 structs are required. So this is still not an ideal solution but
doing more would require a more significant rewrite of ManifestReader
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]