RussellSpitzer commented on issue #3607: URL: https://github.com/apache/iceberg/issues/3607#issuecomment-981737541
``` MERGE INTO iceberg_hive_cat.iceberg_poc_db.iceberg_tab target USING source // no need to select here I believe ON target.col1 = source.col1 AND target.col2 = target.col2 AND target.col3 = source.col3 AND part_date_col between '2021-01-01' and '2021-01-16' ``` It works with hidden partitions as well. The MERGE INTO is performed by first doing a an join of Target vs Source where (ON CLAUSE) In the join it can then push the contents of on clause down to both target and source if possible. The join type is dependent on the types of "matched" clauses. For example a "not matched" requires an outer join, a "matched" clause only requires an inner join. Matched clauses are then applied to specific rows in the result of this join. Because they are not applied universally the predicates inside the match clause cannot be pushed down to "source" or "target"This is why any pushdown clauses muse be in the "ON". This works in on hidden partitioning as well, just as it would in a normal query. If the predicate is on a column that has been partitioned we transform the predicate into the value that was used in partitioning. Certain predicates cannot be transformed though and require a full scan. For example if you say `purchase_ts = timestampOf(2021-01-01)` and you have actually partitioning on `day(purchase_ts)` it transforms the predicate into `day(purchase_ts) = day(timestampOf(2021-01-01)`. But if the partitioning was `bucket(userId)` and your predicate was `userId > 50` there is no way to transform the >50 because bucket uses a hashing. In this case you would want to query on `userID in (50,51,52 ....)` since we can transform equality predicates with the bucket function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
