[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #8560: Spark 3.4: Push down system functions by V2 filters for rewriting DataFiles and PositionDeleteFiles

via GitHub Thu, 14 Sep 2023 08:11:46 -0700


RussellSpitzer commented on code in PR #8560:
URL: https://github.com/apache/iceberg/pull/8560#discussion_r1326109708



##########
spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -395,6 +395,37 @@ public void 
testRewriteDataFilesWithFilterOnPartitionTable() {
     assertEquals("Data after compaction should not change", expectedRecords, 
actualRecords);
   }
 
+  @Test
+  public void testRewriteDataFilesWithFilterOnOnBucketExpression() {
+    // The schema `system` cannot be found in spark_catalog
+    
Assume.assumeFalse(catalogName.equals(SparkCatalogConfig.SPARK.catalogName()));
+    createBucketPartitionTable();
+    // create 5 files for each partition (c2 = 'foo' and c2 = 'bar')
+    insertData(10);
+    List<Object[]> expectedRecords = currentData();
+
+    // select only 5 files for compaction (files in the partition c2 = 'bar')
+    List<Object[]> output =
+        sql(
+            "CALL %s.system.rewrite_data_files(table => '%s',"
+                + " where => '%s.system.bucket(2, c2) = 0')",
+            catalogName, tableIdent, catalogName);
+
+    assertEquals(
+        "Action should rewrite 5 data files from single matching partition"
+            + "(containing c2 = bar) and add 1 data files",
+        row(5, 1),
+        Arrays.copyOf(output.get(0), 2));

Review Comment:
   nit: we could just use row(..., ...) again and they can match a little more 
:)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #8560: Spark 3.4: Push down system functions by V2 filters for rewriting DataFiles and PositionDeleteFiles

Reply via email to