[GitHub] [iceberg] rdblue commented on pull request #2017: Move RewriteDelete to the analyzer

GitBox Mon, 04 Jan 2021 15:11:34 -0800


rdblue commented on pull request #2017:
URL: https://github.com/apache/iceberg/pull/2017#issuecomment-754277810



   > It prohibits optimizations on delete/update/merge nodes and makes certain 
rewrites more complicated.
   
   I don't agree that those optimizations are prohibited. I think they are 
still possible, it would just make the implementation different. Your example 
would initially be converted to a full join producing rows for a merge node. 
But we could write a separate optimizer rule that matches a merge with a full 
join and checks whether the join type can be changed. Instead of doing the 
entire task in a single rewrite rule, we would have a rule to rewrite to the 
full join in the analyzer, then one to prune cases with `false` conditions, and 
then one to rewrite the join.
   
   I think that decomposing this into smaller independent rules is a good thing.
   
   > If we rewrite the plans after operator optimizations as we planned 
earlier, we miss ...
   
   We would also miss analyzer rules. In our Spark build, we have support for 
metadata columns and that's how we project `_file` and `_pos`. The problem is 
that the analyzer rule to add metadata columns to the logical plan runs in the 
analyzer, so we have to add them to the projection in the optimizer rewrite 
rule. I think that there are quite a few tasks and optimizations that have to 
be done in this rule because it needs to run late.
   
   We're accumulating quite a list of rules that won't run on these plans by 
doing the rewrite in the optimizer. And, we lose more depending on what batch 
we use in the optimizer. If the rewrite is done in operator optimization, then 
there's no guarantee that expressions are fully optimized before this rule -- 
that could be a problem for the expr-equivalent-to-false example. On the other 
hand, expressions produced by the rule would get optimized, unless we move it 
after operator optimization. If we do that, then the `Not(cond)` that we 
introduce might be `Not(Not(...))` or other cases that would be simplified by 
the optimizer, and columns in the dynamic filter plan are not pruned.
   
   I think overall I'm leaning toward rewriting in the analyzer. I think we can 
still build the same optimizations, just in different ways. That said, we 
probably don't need to really consider this until after the next release. 
Getting something working is the first step. Then we can talk about refactoring 
into separate rules.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #2017: Move RewriteDelete to the analyzer

Reply via email to