[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3661: Spark: Implement copy-on-write DELETE

GitBox Tue, 07 Dec 2021 10:28:35 -0800


aokolnychyi commented on a change in pull request #3661:
URL: https://github.com/apache/iceberg/pull/3661#discussion_r764260024




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -163,8 +167,25 @@ public DistributionMode distributionMode() {
     return DistributionMode.fromName(modeName);
   }
 
-  public DistributionMode deleteDistributionMode() {
-    return 
rowLevelCommandDistributionMode(TableProperties.DELETE_DISTRIBUTION_MODE);
+  public DistributionMode copyOnWriteDeleteDistributionMode() {
+    String deleteModeName = confParser.stringConf()
+        .option(SparkWriteOptions.DISTRIBUTION_MODE)
+        .tableProperty(TableProperties.DELETE_DISTRIBUTION_MODE)
+        .parseOptional();
+
+    if (deleteModeName != null) {
+      // range distribution only makes sense if the sort order is set
+      DistributionMode deleteMode = DistributionMode.fromName(deleteModeName);
+      if (deleteMode == RANGE && table.sortOrder().isUnsorted()) {
+        return HASH;
+      } else {
+        return deleteMode;
+      }
+    } else {
+      // use hash distribution if write distribution is range or hash

Review comment:
       One reason is to avoid changing the behavior we have right now. The 
second reason is performance. I think it is pretty nice that we can do a hash 
partitioning by file as it is way more efficient than a range-based shuffle (in 
most cases).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3661: Spark: Implement copy-on-write DELETE

Reply via email to