rdblue commented on a change in pull request #1352:
URL: https://github.com/apache/iceberg/pull/1352#discussion_r475749255



##########
File path: data/src/main/java/org/apache/iceberg/data/GenericReader.java
##########
@@ -114,14 +117,40 @@
     return records;
   }
 
-  private Schema fileProjection(boolean hasPosDeletes) {
-    if (hasPosDeletes) {
-      List<Types.NestedField> columns = 
Lists.newArrayList(projection.columns());
+  private Schema fileProjection(List<DeleteFile> posDeletes, List<DeleteFile> 
eqDeletes) {
+    Set<Integer> requiredIds = Sets.newLinkedHashSet();
+    if (!posDeletes.isEmpty()) {
+      requiredIds.add(MetadataColumns.ROW_POSITION.fieldId());
+    }
+
+    for (DeleteFile eqDelete : eqDeletes) {
+      requiredIds.addAll(eqDelete.equalityFieldIds());
+    }
+
+    Set<Integer> missingIds = 
Sets.newLinkedHashSet(Sets.difference(requiredIds, 
TypeUtil.getProjectedIds(projection)));
+
+    if (missingIds.isEmpty()) {
+      return projection;
+    }
+
+    // TODO: support adding nested columns. this will currently fail when 
finding nested columns to add
+    List<Types.NestedField> columns = Lists.newArrayList(projection.columns());
+    for (int fieldId : missingIds) {
+      if (fieldId == MetadataColumns.ROW_POSITION.fieldId()) {
+        continue; // add _pos at the end
+      }
+
+      Types.NestedField field = tableSchema.asStruct().field(fieldId);
+      Preconditions.checkArgument(field != null, "Cannot find required field 
for ID %s", fieldId);

Review comment:
       Is it a necessary constraint? As long as old data files have the deleted 
columns, we can still apply the deletes. Maybe we just need to change how we 
build this projection schema. We could base it on the data file's schema 
instead of the table schema. If the column is in the data file schema, it would 
work fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to