[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #7886: Spark 3.4: Support pushing down system functions by V2 filters

via GitHub Mon, 31 Jul 2023 20:34:19 -0700


ConeyLiu commented on code in PR #7886:
URL: https://github.com/apache/iceberg/pull/7886#discussion_r1280075083



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java:
##########
@@ -98,6 +110,18 @@ public class SparkV2Filters {
 
   private SparkV2Filters() {}
 
+  public static Expression convert(Predicate[] predicates) {
+    Expression expression = Expressions.alwaysTrue();
+    for (Predicate predicate : predicates) {
+      Expression converted = convert(predicate);
+      Preconditions.checkArgument(
+          converted != null, "Cannot convert Spark predicate to Iceberg 
expression: %s", predicate);

Review Comment:
   I checked the doc and implementation of `canDeleteWhere/deleteWhere` :
   ```java
     /**
      * Checks whether it is possible to delete data from a data source table 
that matches filter
      * expressions.
      * <p>
      * Rows should be deleted from the data source iff all of the filter 
expressions match.
      * That is, the expressions must be interpreted as a set of filters that 
are ANDed together.
      * <p>
      * Spark will call this method at planning time to check whether {@link 
#deleteWhere(Predicate[])}
      * would reject the delete operation because it requires significant 
effort. If this method
      * returns false, Spark will not call {@link #deleteWhere(Predicate[])} 
and will try to rewrite
      * the delete operation and produce row-level changes if the data source 
table supports deleting
      * individual records.
      *
      * @param predicates V2 filter expressions, used to select rows to delete 
when all expressions
      *                  match
      * @return true if the delete operation can be performed
      *
      * @since 3.4.0
      */
     default boolean canDeleteWhere(Predicate[] predicates) {
       return true;
     }
   
     /**
      * Delete data from a data source table that matches filter expressions. 
Note that this method
      * will be invoked only if {@link #canDeleteWhere(Predicate[])} returns 
true.
      * <p>
      * Rows are deleted from the data source iff all of the filter expressions 
match. That is, the
      * expressions must be interpreted as a set of filters that are ANDed 
together.
      * <p>
      * Implementations may reject a delete operation if the delete isn't 
possible without significant
      * effort. For example, partitioned data sources may reject deletes that 
do not filter by
      * partition columns because the filter may require rewriting files 
without deleted records.
      * To reject a delete implementations should throw {@link 
IllegalArgumentException} with a clear
      * error message that identifies which expression was rejected.
      *
      * @param predicates predicate expressions, used to select rows to delete 
when all expressions
      *                  match
      * @throws IllegalArgumentException If the delete is rejected due to 
required effort
      */
     void deleteWhere(Predicate[] predicates);
   ```
   
   And the implementation of `canDelete`:
   ```java
     @Override
     public boolean canDeleteWhere(Predicate[] predicates) {
       Preconditions.checkArgument(
           snapshotId == null, "Cannot delete from table at a specific 
snapshot: %s", snapshotId);
   
       Expression deleteExpr = Expressions.alwaysTrue();
   
       for (Predicate predicate : predicates) {
         Expression expr = SparkV2Filters.convert(predicate);
         if (expr != null) {
           deleteExpr = Expressions.and(deleteExpr, expr);
         } else {
           return false;
         }
       }
   
       return canDeleteUsingMetadata(deleteExpr);
     }
   ```
   
   So the `Expression deleteExpr = SparkV2Filters.convert(predicates);` should 
only be called when all the `predicate` could be translated into iceberg 
expressions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #7886: Spark 3.4: Support pushing down system functions by V2 filters

Reply via email to