[GitHub] [iceberg] aokolnychyi commented on pull request #1947: [WIP] Spark MERGE INTO Support (copy-on-write implementation)

GitBox Mon, 21 Dec 2020 04:17:35 -0800


aokolnychyi commented on pull request #1947:
URL: https://github.com/apache/iceberg/pull/1947#issuecomment-748944819



   Is there enough consensus on making the cardinality check optional to match 
Hive and to avoid an extra inner join for merge-on-read? I think it should be 
enabled by default to prevent correctness problems.
   
   I don't think we agreed on how to implement the cardinality check. I had 
some thoughts in 
[this](https://github.com/apache/iceberg/pull/1947#issuecomment-747450897) 
comment. @dilipbiswal @rdblue @RussellSpitzer, what is your take on this? How 
do you see it is implemented?
   
   @RussellSpitzer did mention a corner case where the accumulator approach 
consumes a lot of memory on the driver (if each executor has a substantially 
large set of unique files and they are brought to the driver and merged into a 
single set, which leads to basically having the same copies many times). I am 
not sure we can overcome it, though.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on pull request #1947: [WIP] Spark MERGE INTO Support (copy-on-write implementation)

Reply via email to