RussellSpitzer commented on issue #3607:
URL: https://github.com/apache/iceberg/issues/3607#issuecomment-981737541


   ```
   MERGE INTO iceberg_hive_cat.iceberg_poc_db.iceberg_tab target
   USING source // no need to select here I believe
   ON target.col1 = source.col1 AND target.col2 = target.col2 AND target.col3 = 
source.col3 AND part_date_col between '2021-01-01' and '2021-01-16'
   ```
   
   It works with hidden partitions as well.
   
   The MERGE INTO is performed by first doing a an join of Target vs Source 
where (ON CLAUSE)
   In the join it can then push the contents of on clause down to both target 
and source if possible. The join type is dependent on the types of "matched" 
clauses. For example a "not matched" requires an outer join, a "matched" clause 
only requires an inner join. 
   
   Matched clauses are then applied to specific rows in the result of this 
join. Because they are not applied universally the predicates inside the match 
clause cannot be pushed down to "source" or "target"This is why any pushdown 
clauses muse be in the "ON".
   
   This works in on hidden partitioning as well, just as it would in a normal 
query. If the predicate is on a column that has been partitioned we transform 
the predicate into the value that was used in partitioning. Certain predicates 
cannot be transformed though and require a full scan.
   
   For example if you say `purchase_ts = timestampOf(2021-01-01)` and you have 
actually partitioning on `day(purchase_ts)` it transforms the predicate into 
`day(purchase_ts) = day(timestampOf(2021-01-01)`. 
   
   But if the partitioning was `bucket(userId)` and your predicate was `userId 
> 50` there is no way to transform the >50 because bucket uses a hashing. In 
this case you would want to query on `userID in (50,51,52 ....)` since we can 
transform equality predicates with the bucket function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to