rdblue commented on issue #1422: URL: https://github.com/apache/iceberg/issues/1422#issuecomment-700918564
I think that I agree with the metadata table approach. Because Presto can run tasks and planning at the same time, this is less of an issue. And the work done for Spark in option 2 could translate into a parallel scan on a Presto metadata table as well (converting partition predicates to filters on metadata table columns). Flink is much more likely to consume tables incrementally, so I think it wouldn't be a big issue there for now (but would be nice to hear from them). Risk is lower with option 2, and I think it sounds like the better option. It also pushes on the metadata tables in healthy ways: it would incentivize building pushdown in the files and entries metadata tables and might require adding a `delete_files` metadata table. Those are good side-effects of implementing this that way. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
