[GitHub] [iceberg] rdblue commented on a change in pull request #3661: Spark: Implement copy-on-write DELETE

GitBox Thu, 16 Dec 2021 08:45:42 -0800


rdblue commented on a change in pull request #3661:
URL: https://github.com/apache/iceberg/pull/3661#discussion_r770728274




##########
File path: 
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/TestSparkDistributionAndOrderingUtil.java
##########
@@ -296,6 +299,285 @@ public void testRangeWritePartitionedSortedTable() {
     checkWriteDistributionAndOrdering(table, expectedDistribution, 
expectedOrdering);
   }
 
+  @Test
+  public void testDefaultCopyOnWriteDeleteUnpartitionedUnsortedTable() {
+    sql("CREATE TABLE %s (id bigint, data string) USING iceberg", tableName);
+
+    Table table = validationCatalog.loadTable(tableIdent);
+
+    Distribution expectedDistribution = Distributions.unspecified();
+    SortOrder[] expectedOrdering = new SortOrder[]{};
+    checkCopyOnWriteDeleteDistributionAndOrdering(table, expectedDistribution, 
expectedOrdering);
+  }
+
+  @Test
+  public void testHashCopyOnWriteDeleteUnpartitionedUnsortedTable() {
+    sql("CREATE TABLE %s (id bigint, data string) USING iceberg", tableName);
+
+    Table table = validationCatalog.loadTable(tableIdent);
+
+    table.updateProperties()
+        .set(DELETE_DISTRIBUTION_MODE, WRITE_DISTRIBUTION_MODE_HASH)
+        .commit();
+
+    Expression[] expectedClustering = new Expression[]{
+        Expressions.column(MetadataColumns.FILE_PATH.name()),
+    };
+    Distribution expectedDistribution = 
Distributions.clustered(expectedClustering);
+
+    SortOrder[] expectedOrdering = new SortOrder[]{
+        Expressions.sort(Expressions.column(MetadataColumns.FILE_PATH.name()), 
SortDirection.ASCENDING),
+        
Expressions.sort(Expressions.column(MetadataColumns.ROW_POSITION.name()), 
SortDirection.ASCENDING)
+    };
+
+    checkCopyOnWriteDeleteDistributionAndOrdering(table, expectedDistribution, 
expectedOrdering);
+  }
+
+  @Test
+  public void testRangeCopyOnWriteDeleteUnpartitionedUnsortedTable() {
+    sql("CREATE TABLE %s (id bigint, data string) USING iceberg", tableName);
+
+    Table table = validationCatalog.loadTable(tableIdent);
+
+    table.updateProperties()
+        .set(DELETE_DISTRIBUTION_MODE, WRITE_DISTRIBUTION_MODE_RANGE)
+        .commit();
+
+    Expression[] expectedClustering = new Expression[]{
+        Expressions.column(MetadataColumns.FILE_PATH.name()),
+    };
+    Distribution expectedDistribution = 
Distributions.clustered(expectedClustering);
+
+    SortOrder[] expectedOrdering = new SortOrder[]{
+        Expressions.sort(Expressions.column(MetadataColumns.FILE_PATH.name()), 
SortDirection.ASCENDING),
+        
Expressions.sort(Expressions.column(MetadataColumns.ROW_POSITION.name()), 
SortDirection.ASCENDING)
+    };
+
+    checkCopyOnWriteDeleteDistributionAndOrdering(table, expectedDistribution, 
expectedOrdering);

Review comment:
       This is okay. I noted above that I'd probably lean toward respecting 
RANGE. But hash with _file and _pos seems reasonable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3661: Spark: Implement copy-on-write DELETE

Reply via email to