aokolnychyi commented on PR #4692: URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130495934
I spent some time thinking about this. Let me summarize how I understand the proposal. The use case we are talking about is copy-on-write DELETE **_executed using a broadcast join_** where we read files from the current spec, only one file per split, files are already reasonably compacted and sorted as needed. Right now, we can avoid the shuffle by setting the distribution mode to `none` but we can't disable a potentially redundant local sort. Is my understanding correct? We can't target UPDATE and MERGE as those may change the ordering/partition of records even if we read properly sorted/partitioned files. We can't target tables with small files as having a split per file will be costly. The first two restrictions seem reasonable but I am afraid we can't target DELETE that triggers a shuffle as it will distribute records across tasks by the delete condition, not necessarily by the partition/sort key. That poses a problem as Iceberg does not know which join implementation Spark is going to pick so we can't say whether the sort is really redundant. The proper solution would be to report ordering of scan tasks to Spark, request required distribution/ordering on write and let Spark decide if the sort is redundant. There was a Spark proposal for exposing tasks ordering but it is not available yet. If we had a way to pass options to MERGE commands, we could simply support the same using these steps: - Set `read.split.open-file-cost` to `Long.MaxValue` to force one file per split - Set `write.delete.distribution-mode` to `none` - Set write option `use-table-distribution-and-ordering` to `false` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
