[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3661: Spark: Implement copy-on-write DELETE

GitBox Fri, 03 Dec 2021 12:07:16 -0800


aokolnychyi commented on a change in pull request #3661:
URL: https://github.com/apache/iceberg/pull/3661#discussion_r762212642




##########
File path: 
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -163,8 +167,25 @@ public DistributionMode distributionMode() {
     return DistributionMode.fromName(modeName);
   }
 
-  public DistributionMode deleteDistributionMode() {
-    return 
rowLevelCommandDistributionMode(TableProperties.DELETE_DISTRIBUTION_MODE);
+  public DistributionMode copyOnWriteDeleteDistributionMode() {

Review comment:
       I did not plan to have a custom method for copy-on-write but the 
question is whether we want to support `range` distribution mode for 
copy-on-write deletes. Since recently, there is a separate 
`DELETE_DISTRIBUTION_MODE` property and we need to define our behavior if it is 
set to `range`. I am debating whether that is useful. The only use case I can 
think of is if the cluster we issue deletes from is relatively small and we 
cannot hash all data per file into a single task. In that case, we may prefer 
to use `range` if the sort order is set.
   
   Any thoughts are welcome.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3661: Spark: Implement copy-on-write DELETE

Reply via email to