aokolnychyi edited a comment on pull request #1947:
URL: https://github.com/apache/iceberg/pull/1947#issuecomment-747450897


   I think this PR is a great start, @dilipbiswal!
   
   I noted the following points that we need to address for correctness (some 
may be done separately):
   - We should perform the cardinality check as SQL standard requires.
   - We should align the assignments according to the target columns. This also 
applies to UPDATE.
   - We should think about how to group data before writing.
   - We will need more tests.
   
   There are also good to have points (can be done in follow-ups if too much 
trouble):
   - We should use a LEFT ANTI join for merge statements with only `WHEN NOT 
MATCHED THEN INSERT` clause.
   - We should use a RIGHT OUTER join for merge statements with only `WHEN 
MATCHED` clauses.
   
   Let's discuss each point one by one.
   
   **Cardinality check**
   
   SQL standard requires an exception to be thrown if the ON clause in MERGE is 
such that more than 1 row in source matches a row in target. See 
[this](https://issues.apache.org/jira/browse/HIVE-14949) Hive issue for more 
info.
   
   Some systems do the cardinality check all the time while some, like Hive, 
make it optional. I'd say we should make it optional and let users configure it 
in the table properties by adding `write.merge.cardinality-check.enabled` 
property (true by default). The main problem with the cardinality check is the 
performance penalty: it requires an inner join. We are already doing this inner 
join for copy-on-write to find matches so we can modify that code to also do 
the cardinality check at the same time. I don't think we need an inner join for 
merge-on-read, though. 
   
   To sum up, I'd vote for having a flag in table properties and make the 
cardinality check optional (just like Hive ACID).
   
   We need to think a bit about how we implement the cardinality check. Here, I 
am open to suggestions. One idea is to modify nodes for dynamic file filtering. 
For example, we can use `monotonically_increasing_id` until we have `row_id` 
metadata column, append it to rows in the target table before the inner join to 
find matches and then perform the cardinality check and collect matching files. 
In order to make this efficient, we should reuse as much work as possible.
   
   One way to do that is to leverage an accumulator to track matching files:
   - append `_row_id` and `_file` columns to the target table
   - do an inner join on the merge condition
   - define a udf that accepts the file name, adds it to the accumulator and 
retuns 1
   - group by `_row_id`, perform the cardinality check
   - access the accumulator to get the matching files
   
   Another way is like this:
   - append `_row_id` and `_file` columns to the target table
   - do an inner join on the merge condition
   - select `_row_id`,  `_file` into a separate plan
   - temporarily cache the created plan (or persist it on executor nodes)
   - perform 2 queries in parallel: one for cardinality check and one for the 
matching files
   - uncache/destroy the temp plan
   
   **Align assignments**
   
   I don't think Spark aligns the assignments inside UPDATE or MERGE. We won't 
be able to support updating nested fields without it. We will probably need a 
separate rule for this. The same rule can be applied to UPDATE.
   
   **Group data before writing**
   
   We need to think about how to group data before writing new files with our 
updates and new records. One option is to group and order by partition columns. 
Another option is to group and order by the sort spec. The third option is to 
group updates and new records separately. Let's discuss it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to