Hi, I drafted a proposal regarding adding iceberg rest-compliant scan planning support to Polaris. The proposal doc can be found here: https://docs.google.com/document/d/1agpz4wwXxWfEy9fJLgPRDcrzdR5USM1i9vQhOBcHo3Q/edit?usp=sharing
tldr: doc proposes to first add a straightforward implementation of scan planning in the initial phase and integrate new endpoints with polaris authz. Subsequently, we can enhance scan planning performance with 2 independent caching layers: - *CachingFileIO* - FileIO wrapper that wraps existing FileIO implementations and introduces a configurable Caffeine-powered in-memory cache to speed up access to manifest files. - *SQL Pruning Index* - additional index stored in a rdbms and asynchronously updated by polaris when a new table snapshot is registered. The goal is to store all relevant per-file stats in a db table that will allow applying a pruning predicate in a single sql query. This is essentially a ducklake-style index but used only as a file pruning index rather than the source of truth. Index is allowed to lag behind the latest snapshot in which case ScanPlanner will use both index and underlying files for the relevant parts of the table metadata. I have a POC for caching layers in a private repo which you can take a look at as well: https://github.com/tokoko/iceberg-cache/. thanks, Tornike
