holdenk commented on code in PR #7886:
URL: https://github.com/apache/iceberg/pull/7886#discussion_r1279806639
##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java:
##########
@@ -360,10 +425,57 @@ private static boolean hasNoInFilter(Predicate predicate)
{
}
private static boolean isSupportedInPredicate(Predicate predicate) {
- if (!isRef(childAtIndex(predicate, 0))) {
+ if (!canConvertToTerm(childAtIndex(predicate, 0))) {
return false;
} else {
return
Arrays.stream(predicate.children()).skip(1).allMatch(SparkV2Filters::isLiteral);
}
}
+
+ /** Should be called after {@link #canConvertToTerm} passed */
+ private static <T> UnboundTerm<Object> toTerm(T input) {
+ if (input instanceof NamedReference) {
+ return Expressions.ref(SparkUtil.toColumnName((NamedReference) input));
+ } else if (input instanceof UserDefinedScalarFunc) {
+ return udfToTerm((UserDefinedScalarFunc) input);
+ } else {
+ return null;
+ }
+ }
+
+ @SuppressWarnings("checkstyle:CyclomaticComplexity")
+ private static UnboundTerm<Object> udfToTerm(UserDefinedScalarFunc udf) {
+ org.apache.spark.sql.connector.expressions.Expression[] children =
udf.children();
+ String udfName = udf.name().toLowerCase(Locale.ROOT);
+ if (children.length == 1) {
+ org.apache.spark.sql.connector.expressions.Expression child =
children[0];
+ if (isRef(child)) {
Review Comment:
Would it make sense to also support literals here?
##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkV2Filters.java:
##########
@@ -98,6 +110,18 @@ public class SparkV2Filters {
private SparkV2Filters() {}
+ public static Expression convert(Predicate[] predicates) {
+ Expression expression = Expressions.alwaysTrue();
+ for (Predicate predicate : predicates) {
+ Expression converted = convert(predicate);
+ Preconditions.checkArgument(
+ converted != null, "Cannot convert Spark predicate to Iceberg
expression: %s", predicate);
Review Comment:
Even if it's following what was done elsewhere this means if there was
non-iceberg UDF predicate that got pushed down it would fail to push down the
iceberg expressions? Looking at the code this only seems to be used (currently)
in the deleteWhere, but I think for the deleteWhere codepath we should not
throw.
##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -359,8 +359,8 @@ private boolean canDeleteUsingMetadata(Expression
deleteExpr) {
}
@Override
- public void deleteWhere(Filter[] filters) {
- Expression deleteExpr = SparkFilters.convert(filters);
+ public void deleteWhere(Predicate[] predicates) {
+ Expression deleteExpr = SparkV2Filters.convert(predicates);
Review Comment:
So this is the one case where we use convert on a list of predicates instead
one at a time, with canDeleteWhere using a for loop on the individual
expressions we will can different results.
I think given the use case of convert + list of predicates we should change
that to not throw as originally suggest by @rdblue .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]