RussellSpitzer commented on a change in pull request #1744:
URL: https://github.com/apache/iceberg/pull/1744#discussion_r520047139



##########
File path: core/src/main/java/org/apache/iceberg/ManifestEntry.java
##########
@@ -54,7 +54,21 @@ static Schema getSchema(StructType partitionType) {
   }
 
   static Schema wrapFileSchema(StructType fileType) {
-    return new Schema(STATUS, SNAPSHOT_ID, SEQUENCE_NUMBER, 
required(DATA_FILE_ID, "data_file", fileType));
+    /*
+     * Used when determining the projection for ManifestReader, because the 
projection for data_file can be empty
+     * we need to explicitly remove the column if it is empty. If we leave it 
in the schema we end up pruning it out
+     * (because no columns within it are required) but still require it be 
present in the projection. This causes
+     * an exception to be raised.
+     *
+     * To fix this we skip requesting the DATA_FILE column when nothing is 
required from it.
+     *
+     * TODO Fix the projection in  ManifestReader itself to not just work with 
data_file columns
+     */
+    if (!fileType.fields().isEmpty()) {
+      return new Schema(STATUS, SNAPSHOT_ID, SEQUENCE_NUMBER, 
required(DATA_FILE_ID, "data_file", fileType));

Review comment:
       This is the first bug. This schema is used in the Manifest Reader 
.project call, when the data_file file type requested is empty then the column 
is ignored. But since it is there in the schema it is still required. This 
didn't break unpartitioned tables because of the second bug below.
   
   As mentioned in the comment, it would be better if we got the projection 
schema from a combination of only the columns required, rather than always 
assuming all 4 structs are required. So this is still not an ideal solution but 
doing more would require a more significant rewrite of ManifestReader




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to