aokolnychyi commented on pull request #2017:
URL: https://github.com/apache/iceberg/pull/2017#issuecomment-754533819


   > I don't agree that those optimizations are prohibited. I think they are 
still possible, it would just make the implementation different.
   
   By prohibited I mean that we won't have separate delete/update/merge nodes 
after the rewrite (as far as I understand). Therefore, equivalent optimizations 
will be harder. For example, a rule to drop a branch on `MergeIntoTable` would 
be trivial if we run it before the rewrite. Doing the same optimization after 
the rewrite is possible but would be harder.
   
   > The problem is that the analyzer rule to add metadata columns to the 
logical plan runs in the analyzer
   
   This adds another perspective. We could make it work with row-level nodes 
too if we had a concept of `_row_id` that a data source would return.
   
   > And, we lose more depending on what batch we use in the optimizer.
   
   I think the initial plan was to do this after operator optimization to make 
sure conditions are optimal. However, I agree that we won't run operator 
optimization batch (and potentially some other optimizer rules) after the 
rewrite, which is probably the biggest concern to me in rewriting in the 
optimizer.
   
   Overall, I believe we should think through how the rewrite in the analyzer 
will work, how to avoid job planning twice, do extra optimization later, etc. 
If we can come up with solutions to those problems, I think it will be safer. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to