ConeyLiu commented on code in PR #7886:
URL: https://github.com/apache/iceberg/pull/7886#discussion_r1280075083
##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java:
##########
@@ -98,6 +110,18 @@ public class SparkV2Filters {
private SparkV2Filters() {}
+ public static Expression convert(Predicate[] predicates) {
+ Expression expression = Expressions.alwaysTrue();
+ for (Predicate predicate : predicates) {
+ Expression converted = convert(predicate);
+ Preconditions.checkArgument(
+ converted != null, "Cannot convert Spark predicate to Iceberg
expression: %s", predicate);
Review Comment:
I checked the doc and implementation of `canDeleteWhere/deleteWhere` :
```java
/**
* Checks whether it is possible to delete data from a data source table
that matches filter
* expressions.
* <p>
* Rows should be deleted from the data source iff all of the filter
expressions match.
* That is, the expressions must be interpreted as a set of filters that
are ANDed together.
* <p>
* Spark will call this method at planning time to check whether {@link
#deleteWhere(Predicate[])}
* would reject the delete operation because it requires significant
effort. If this method
* returns false, Spark will not call {@link #deleteWhere(Predicate[])}
and will try to rewrite
* the delete operation and produce row-level changes if the data source
table supports deleting
* individual records.
*
* @param predicates V2 filter expressions, used to select rows to delete
when all expressions
* match
* @return true if the delete operation can be performed
*
* @since 3.4.0
*/
default boolean canDeleteWhere(Predicate[] predicates) {
return true;
}
/**
* Delete data from a data source table that matches filter expressions.
Note that this method
* will be invoked only if {@link #canDeleteWhere(Predicate[])} returns
true.
* <p>
* Rows are deleted from the data source iff all of the filter expressions
match. That is, the
* expressions must be interpreted as a set of filters that are ANDed
together.
* <p>
* Implementations may reject a delete operation if the delete isn't
possible without significant
* effort. For example, partitioned data sources may reject deletes that
do not filter by
* partition columns because the filter may require rewriting files
without deleted records.
* To reject a delete implementations should throw {@link
IllegalArgumentException} with a clear
* error message that identifies which expression was rejected.
*
* @param predicates predicate expressions, used to select rows to delete
when all expressions
* match
* @throws IllegalArgumentException If the delete is rejected due to
required effort
*/
void deleteWhere(Predicate[] predicates);
```
And the implementation of `canDelete`:
```java
@Override
public boolean canDeleteWhere(Predicate[] predicates) {
Preconditions.checkArgument(
snapshotId == null, "Cannot delete from table at a specific
snapshot: %s", snapshotId);
Expression deleteExpr = Expressions.alwaysTrue();
for (Predicate predicate : predicates) {
Expression expr = SparkV2Filters.convert(predicate);
if (expr != null) {
deleteExpr = Expressions.and(deleteExpr, expr);
} else {
return false;
}
}
return canDeleteUsingMetadata(deleteExpr);
}
```
So the `Expression deleteExpr = SparkV2Filters.convert(predicates);` should
only be called when all the `predicate` could be translated into iceberg
expressions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]