RussellSpitzer commented on issue #8387:
URL: https://github.com/apache/iceberg/issues/8387#issuecomment-1691900501

   So if I remember correctly the root of the issue is this. To do a SPJ we 
first align things based on partitions
   ```
   [ Source Partition ] === [Destination Partition]
   ```
   But Sometimes this balancing is not good so we want to be able to break up 
one of the two sides here into multiple pieces. To do that we split one of the 
sides up into multiple parts
   ```
   [Source partition] === [Destination Partition Part 1]
   [Source partition] === [Destination Partition Part 2]
   [Source partition] === [Destination Partition Part 3]
   ```
   
   For a "WHEN MATCHED" this is good, because if any task succeeds we can 
perform the action on the result. Each task can still be treated completely 
independently.
   
   For "WHEN NOT MATCHED" we have a problem because then we can only apply the 
change if all of the tasks do not match for a given expression. We don't have a 
mechanism for doing a reduction over key (? or something like that) after doing 
our checks. This means we can't do our SPJ optimization and have to do a full 
shuffle.
   
   I think I have that all right, but I'm mostly remembering our discussion 
when this was originally being implemented.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to