aokolnychyi commented on a change in pull request #3661:
URL: https://github.com/apache/iceberg/pull/3661#discussion_r762212642
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -163,8 +167,25 @@ public DistributionMode distributionMode() {
return DistributionMode.fromName(modeName);
}
- public DistributionMode deleteDistributionMode() {
- return
rowLevelCommandDistributionMode(TableProperties.DELETE_DISTRIBUTION_MODE);
+ public DistributionMode copyOnWriteDeleteDistributionMode() {
Review comment:
I did not plan to have a custom method for copy-on-write but the
question is whether we want to support `range` distribution mode for
copy-on-write deletes. Since recently, there is a separate
`DELETE_DISTRIBUTION_MODE` property and we need to define our behavior if it is
set to `range`. I am debating whether that is useful. The only use case I can
think of is if the cluster we issue deletes from is relatively small and we
cannot hash all data per file into a single task. In that case, we may prefer
to use `range` if the sort order is set.
Any thoughts are welcome.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]