aokolnychyi commented on a change in pull request #3661:
URL: https://github.com/apache/iceberg/pull/3661#discussion_r764260024
##########
File path:
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -163,8 +167,25 @@ public DistributionMode distributionMode() {
return DistributionMode.fromName(modeName);
}
- public DistributionMode deleteDistributionMode() {
- return
rowLevelCommandDistributionMode(TableProperties.DELETE_DISTRIBUTION_MODE);
+ public DistributionMode copyOnWriteDeleteDistributionMode() {
+ String deleteModeName = confParser.stringConf()
+ .option(SparkWriteOptions.DISTRIBUTION_MODE)
+ .tableProperty(TableProperties.DELETE_DISTRIBUTION_MODE)
+ .parseOptional();
+
+ if (deleteModeName != null) {
+ // range distribution only makes sense if the sort order is set
+ DistributionMode deleteMode = DistributionMode.fromName(deleteModeName);
+ if (deleteMode == RANGE && table.sortOrder().isUnsorted()) {
+ return HASH;
+ } else {
+ return deleteMode;
+ }
+ } else {
+ // use hash distribution if write distribution is range or hash
Review comment:
One reason is to avoid changing the behavior we have right now. The
second reason is performance. I think it is pretty nice that we can do a hash
partitioning by file as it is way more efficient than a range-based shuffle (in
most cases).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]