aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130495934

   I spent some time thinking about this. Let me summarize how I understand the 
proposal.
   
   The use case we are talking about is copy-on-write DELETE **_executed using 
a broadcast join_** where we read files from the current spec, only one file 
per split, files are already reasonably compacted and sorted as needed. Right 
now, we can avoid the shuffle by setting the distribution mode to `none` but we 
can't disable a potentially redundant local sort. Is my understanding correct?
   
   We can't target UPDATE and MERGE as those may change the ordering/partition 
of records even if we read properly sorted/partitioned files. We can't target 
tables with small files as having a split per file will be costly. The first 
two restrictions seem reasonable but I am afraid we can't target DELETE that 
triggers a shuffle as it will distribute records across tasks by the delete 
condition, not necessarily by the partition/sort key. That poses a problem as 
Iceberg does not know which join implementation Spark is going to pick so we 
can't say whether the sort is really redundant. The proper solution would be to 
report ordering of scan tasks to Spark, request required distribution/ordering 
on write and let Spark decide if the sort is redundant. There was a Spark 
proposal for exposing tasks ordering but it is not available yet.
   
   If we had a way to pass options to MERGE commands, we could simply support 
the same using these steps:
   - Set `read.split.open-file-cost` to `Long.MaxValue` to force one file per 
split
   - Set `write.delete.distribution-mode` to `none`
   - Set write option `use-table-distribution-and-ordering` to `false`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to