aokolnychyi commented on pull request #2017: URL: https://github.com/apache/iceberg/pull/2017#issuecomment-754533819
> I don't agree that those optimizations are prohibited. I think they are still possible, it would just make the implementation different. By prohibited I mean that we won't have separate delete/update/merge nodes after the rewrite (as far as I understand). Therefore, equivalent optimizations will be harder. For example, a rule to drop a branch on `MergeIntoTable` would be trivial if we run it before the rewrite. Doing the same optimization after the rewrite is possible but would be harder. > The problem is that the analyzer rule to add metadata columns to the logical plan runs in the analyzer This adds another perspective. We could make it work with row-level nodes too if we had a concept of `_row_id` that a data source would return. > And, we lose more depending on what batch we use in the optimizer. I think the initial plan was to do this after operator optimization to make sure conditions are optimal. However, I agree that we won't run operator optimization batch (and potentially some other optimizer rules) after the rewrite, which is probably the biggest concern to me in rewriting in the optimizer. Overall, I believe we should think through how the rewrite in the analyzer will work, how to avoid job planning twice, do extra optimization later, etc. If we can come up with solutions to those problems, I think it will be safer. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
