flyrain commented on code in PR #4683:
URL: https://github.com/apache/iceberg/pull/4683#discussion_r875479043
##########
data/src/main/java/org/apache/iceberg/data/DeleteFilter.java:
##########
@@ -290,8 +295,6 @@ private static Schema fileProjection(Schema tableSchema,
Schema requestedSchema,
requiredIds.addAll(eqDelete.equalityFieldIds());
}
- requiredIds.add(MetadataColumns.IS_DELETED.fieldId());
Review Comment:
We project the pos column only if there are pos deletes as the following
code shows, which makes sense, since we need it for filtering pos deletes.
```
if (!posDeletes.isEmpty()) {
requiredIds.add(MetadataColumns.ROW_POSITION.fieldId());
}
```
Here is my thought on Is_deleted column, it presents only if the front
end(e.g. spark read) asked for it. For example, in case of CDC, we put it in
the filter to read deleted rows. Here is the code from my CDC draft PR #4539.
```
Dataset<Row> scanDF = spark().read().format("iceberg")
.option(SparkReadOptions.FILE_SCAN_TASK_SET_ID, groupID)
.load(table.name())
.filter(functions.column(MetadataColumns.IS_DELETED.name()).equalTo(true));
```
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]