RussellSpitzer commented on issue #8387: URL: https://github.com/apache/iceberg/issues/8387#issuecomment-1691900501
So if I remember correctly the root of the issue is this. To do a SPJ we first align things based on partitions ``` [ Source Partition ] === [Destination Partition] ``` But Sometimes this balancing is not good so we want to be able to break up one of the two sides here into multiple pieces. To do that we split one of the sides up into multiple parts ``` [Source partition] === [Destination Partition Part 1] [Source partition] === [Destination Partition Part 2] [Source partition] === [Destination Partition Part 3] ``` For a "WHEN MATCHED" this is good, because if any task succeeds we can perform the action on the result. Each task can still be treated completely independently. For "WHEN NOT MATCHED" we have a problem because then we can only apply the change if all of the tasks do not match for a given expression. We don't have a mechanism for doing a reduction over key (? or something like that) after doing our checks. This means we can't do our SPJ optimization and have to do a full shuffle. I think I have that all right, but I'm mostly remembering our discussion when this was originally being implemented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
