aokolnychyi commented on pull request #2017:
URL: https://github.com/apache/iceberg/pull/2017#issuecomment-753906176
I have mixed feelings about this direction.
On one hand, it is great to rewrite this early in the analyzer and leave the
optimizer untouched. On the other hand, it prohibits optimizations on
delete/update/merge nodes and makes certain rewrites more complicated.
To give an example where having merge nodes in the optimizer may be helpful.
If we rewrite the following operation in the analyzer, we will have to use a
full outer join.
```
MERGE target t USING source s
ON t.id = s.id
WHEN MATCHED AND exp_equivalent_to_false
UPDATE SET …
WHEN NOT MATCHED
THEN INSERT
```
However, the optimizer can detect that our UPDATE condition always evaluates
to false, so it can simplify the plan:
```
MERGE target t USING source s
ON t.id = s.id
WHEN NOT MATCHED
THEN INSERT
```
The optimized merge operation can be executed using a left anti join instead
of a full outer join. Similarly, if we can get rid of the NOT MATCHED clause,
we can use a right outer join instead of a full outer join. There may be more
cases like this that we don't know yet.
Also, there are a lot of rewrites that happen in the optimizer like
@dilipbiswal mentioned so the current approach isn't that bad in my view. In
addition, row-level ops are a bit different compared to the refresh of
materialized views which is replaced with INSERTs. There is a clear translation
for refresh statements and it does not require any optimization.
Update/delete/merge commands are more complicated to me and I'll be fine
addressing them differently.
If we rewrite the plans after operator optimizations as we planned earlier,
we miss these rules:
- `PullupCorrelatedPredicates` (fixed in Spark 3.1 to cover
delete/update/merge)
- `OptimizeSubqueries` (works fine now as it runs on `SubqueryExpression`
which is a child of the row-level plans)
- Replacement of certain expressions with equivalent ones
- Operator optimization rules (we match the behavior for subqueries)
I think it is fine to run the rewrite after those rules.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]