RussellSpitzer commented on code in PR #4902:
URL: https://github.com/apache/iceberg/pull/4902#discussion_r916887269


##########
spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -133,6 +145,43 @@ public void testRewriteDataFilesWithSortStrategy() {
     assertEquals("Data after compaction should not change", expectedRecords, 
actualRecords);
   }
 
+  @Test
+  public void testRewriteDataFilesWithZOrder() {
+    createTable();
+    // create 10 files under non-partitioned table
+    insertData(10);
+    List<Object[]> expectedRecords = currentData();
+
+    // set z_order = c1,c2
+    List<Object[]> output = sql(
+        "CALL %s.system.rewrite_data_files(table => '%s', " +
+        "strategy => 'sort', sort_order => 'zorder(c1,c2)')",
+        catalogName, tableIdent);
+
+    assertEquals("Action should rewrite 10 data files and add 1 data files",
+        ImmutableList.of(row(10, 1)),
+        output);
+
+    List<Object[]> actualRecords = currentData();
+    assertEquals("Data after compaction should not change", expectedRecords, 
actualRecords);
+
+    // Due to Z_order, the data written will be in the below order.
+    // As there is only one small output file, we can validate the query 
ordering (as it will not change).
+    ImmutableList<Object[]> expectedRows = ImmutableList.of(

Review Comment:
   The principal I would go for here is "property testing" where instead of 
attempting to assert an absolute, "This operation provides this order" we say 
something like "This operation provides an order that is different than another 
order". That way we can change the algorithm and this test (which doesn't 
actually check the correctness of the algorithm it only checks whether 
something happened) doesn't have to change.
   
   So like in this case we could check that the order of the data is different 
than the hierarchal sorted data and also different than the original ordering 
of the data (without any sort or zorder).
   
   That said we can always skip this for now, but In general I try to avoid 
tests with absolute answers when we aren't trying to make sure that we get that 
specific answer in the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to