ludlows commented on code in PR #6760:
URL: https://github.com/apache/iceberg/pull/6760#discussion_r1133094571
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##########
@@ -129,13 +129,14 @@ public InternalRow[] call(InternalRow args) {
private RewriteDataFiles checkAndApplyFilter(
RewriteDataFiles action, String where, String tableName) {
if (where != null) {
- try {
- Expression expression =
- SparkExpressionConverter.collectResolvedSparkExpression(spark(),
tableName, where);
- return
action.filter(SparkExpressionConverter.convertToIcebergExpression(expression));
- } catch (AnalysisException e) {
- throw new IllegalArgumentException("Cannot parse predicates in where
option: " + where);
+ Option<Expression> expressionOption =
+
SparkExpressionConverter.collectResolvedSparkExpressionOption(spark(),
tableName, where);
+ if (expressionOption.isEmpty()) {
+ return action.filter(Expressions.alwaysFalse());
Review Comment:
thank you for the review.
yes. we can make the function return an Expression.
However, we need to check the `where` value in the main function.
the rewriteDataFilesAction aims to rewrite all data files when `where` is
`null`.
and we aim to rewrite nothing when `where` gives a `false` value.
we cannot distinguish these two cases if the function `filter` shown above
returns a `null`.
I would like to propose the implemantation below:
in the main function:
```java
@Override
public InternalRow[] call(InternalRow args) {
Identifier tableIdent = toIdentifier(args.getString(0),
PARAMETERS[0].name());
return modifyIcebergTable(
tableIdent,
table -> {
String quotedFullIdentifier =
Spark3Util.quotedFullIdentifier(tableCatalog().name(),
tableIdent);
RewriteDataFiles action = actions().rewriteDataFiles(table);
String strategy = args.isNullAt(1) ? null : args.getString(1);
String sortOrderString = args.isNullAt(2) ? null :
args.getString(2);
if (strategy != null || sortOrderString != null) {
action = checkAndApplyStrategy(action, strategy,
sortOrderString, table.schema());
}
if (!args.isNullAt(3)) {
action = checkAndApplyOptions(args, action);
}
String where = args.isNullAt(4) ? null : args.getString(4);
if (where != null) {
Expression expression = filter(where, quotedFullIdentifier);
if (expression == null) {
// terminate immediately
RewriteDataFiles.Result result = new
BaseRewriteDataFilesResult(Lists.newArrayList());
return toOutputRows(result);
}
action =
action.filter(SparkExpressionConverter.convertToIcebergExpression(expression));
}
RewriteDataFiles.Result result = action.execute();
return toOutputRows(result);
});
}
```
in the helper function `filter`:
```java
private Expression filter(String where, String tableName){
Option<Expression> expressionOption =
SparkExpressionConverter.collectResolvedSparkExpressionOption(spark(),
tableName, where);
if(expressionOption.isEmpty()) {
return null;
}
return expressionOption.get();
}
```
how do you think about it?
thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]