[GitHub] [iceberg] holdenk commented on a diff in pull request #7886: Spark 3.4: Support pushing down system functions by V2 filters

via GitHub Mon, 31 Jul 2023 12:51:13 -0700


holdenk commented on code in PR #7886:
URL: https://github.com/apache/iceberg/pull/7886#discussion_r1279806639



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java:
##########
@@ -360,10 +425,57 @@ private static boolean hasNoInFilter(Predicate predicate) 
{
   }
 
   private static boolean isSupportedInPredicate(Predicate predicate) {
-    if (!isRef(childAtIndex(predicate, 0))) {
+    if (!canConvertToTerm(childAtIndex(predicate, 0))) {
       return false;
     } else {
       return 
Arrays.stream(predicate.children()).skip(1).allMatch(SparkV2Filters::isLiteral);
     }
   }
+
+  /** Should be called after {@link #canConvertToTerm} passed */
+  private static <T> UnboundTerm<Object> toTerm(T input) {
+    if (input instanceof NamedReference) {
+      return Expressions.ref(SparkUtil.toColumnName((NamedReference) input));
+    } else if (input instanceof UserDefinedScalarFunc) {
+      return udfToTerm((UserDefinedScalarFunc) input);
+    } else {
+      return null;
+    }
+  }
+
+  @SuppressWarnings("checkstyle:CyclomaticComplexity")
+  private static UnboundTerm<Object> udfToTerm(UserDefinedScalarFunc udf) {
+    org.apache.spark.sql.connector.expressions.Expression[] children = 
udf.children();
+    String udfName = udf.name().toLowerCase(Locale.ROOT);
+    if (children.length == 1) {
+      org.apache.spark.sql.connector.expressions.Expression child = 
children[0];
+      if (isRef(child)) {

Review Comment:
   Would it make sense to also support literals here?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java:
##########
@@ -98,6 +110,18 @@ public class SparkV2Filters {
 
   private SparkV2Filters() {}
 
+  public static Expression convert(Predicate[] predicates) {
+    Expression expression = Expressions.alwaysTrue();
+    for (Predicate predicate : predicates) {
+      Expression converted = convert(predicate);
+      Preconditions.checkArgument(
+          converted != null, "Cannot convert Spark predicate to Iceberg 
expression: %s", predicate);

Review Comment:
   Even if it's following what was done elsewhere this means if there was 
non-iceberg UDF predicate that got pushed down it would fail to push down the 
iceberg expressions? Looking at the code this only seems to be used (currently) 
in the deleteWhere, but I think for the deleteWhere codepath we should not 
throw.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -359,8 +359,8 @@ private boolean canDeleteUsingMetadata(Expression 
deleteExpr) {
   }
 
   @Override
-  public void deleteWhere(Filter[] filters) {
-    Expression deleteExpr = SparkFilters.convert(filters);
+  public void deleteWhere(Predicate[] predicates) {
+    Expression deleteExpr = SparkV2Filters.convert(predicates);

Review Comment:
   So this is the one case where we use convert on a list of predicates instead 
one at a time, with canDeleteWhere using a for loop on the individual 
expressions we will can different results.
   
   I think given the use case of convert + list of predicates we should change 
that to not throw as originally suggest by @rdblue .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] holdenk commented on a diff in pull request #7886: Spark 3.4: Support pushing down system functions by V2 filters

Reply via email to