dramaticlly commented on code in PR #8560:
URL: https://github.com/apache/iceberg/pull/8560#discussion_r1326553809
##########
spark/v3.4/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala:
##########
@@ -28,14 +28,15 @@ import org.apache.spark.sql.catalyst.expressions.Literal
import org.apache.spark.sql.catalyst.plans.logical.Filter
import org.apache.spark.sql.catalyst.plans.logical.LeafNode
import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy
object SparkExpressionConverter {
def convertToIcebergExpression(sparkExpression: Expression):
org.apache.iceberg.expressions.Expression = {
// Currently, it is a double conversion as we are converting Spark
expression to Spark filter
// and then converting Spark filter to Iceberg expression.
// But these two conversions already exist and well tested. So, we are
going with this approach.
- SparkFilters.convert(DataSourceStrategy.translateFilter(sparkExpression,
supportNestedPredicatePushdown = true).get)
+
SparkV2Filters.convert(DataSourceV2Strategy.translateFilterV2(sparkExpression).get)
Review Comment:
thank you @ConeyLiu , I will rebase my change if 8394 gets merged first.
Also added some comments where we are now convert Spark catalyst expression
to Predicate instead of spark source filter. I ran all the unit tests to make
sure old filter are working as expected.
##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -260,7 +285,7 @@ public void testBinPackAfterPartitionChange() {
}
@Test
- public void testBinPackWithDeletes() throws Exception {
Review Comment:
yes, IntelliJ suggested during code analysis and I think we dont really
throw here.
##########
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -480,7 +511,10 @@ public void testRewriteDataFilesWithAllPossibleFilters() {
sql(
"CALL %s.system.rewrite_data_files(table => '%s'," + " where => 'c2
like \"%s\"')",
catalogName, tableIdent, "car%");
-
+ // StringStartsWith
Review Comment:
yeah good catch. Initially I was about to add filter here for bucket
transform (forgot to change) but I end up create a new method to test all
V2Filters can be evaluated without exception.
##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -223,6 +223,31 @@ public void testBinPackWithFilter() {
assertEquals("Rows must match", expectedRecords, actualRecords);
}
+ @Test
+ public void testBinPackWithFilterOnBucketExpression() {
+ PartitionSpec spec = PartitionSpec.builderFor(SCHEMA).bucket("c3",
2).build();
Review Comment:
Thank you Coney, your test utils class is super helpful. However I realized
this SparkAction tests was assumed to use hadoop catalog so the table creation
is a bit different as it's by table location
https://github.com/apache/iceberg/blob/master/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java#L1563.
But I opted to use your `SystemFunctionPushDownHelper` in
TestRewriteDataFilesProcedure.
##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -1606,6 +1631,20 @@ private Table createTypeTestTable() {
return table;
}
+ private void insertData(int recordCount, int numDataFiles) {
Review Comment:
dropped this function as we dont need this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]