Yufei, Adnan, thanks for taking a look at the proposal. I definitely understand the concern and agree that there should be a way to avoid including compute-intensive workload in polaris server and/or metadata db. Still, my preferred approach would be to implement entire functionality first and make it configurable later on when we have better idea of how Delegation Service will look like (planning will sit behind a feature flag, after all). if that sounds fine, I can adjust the proposal to include eventual integration with delegation service (both for ScanPlanner SPI and indexing) rather than make Delegation Service a hard prerequisite.
regarding SQL pruning index: I agree that it's a big topic and probably valuable even outside of the scope of polaris. still.. since there's no existing spec for anything like that outside of polaris, I think it makes sense to start laying the foundation for it here for this particular use case, don't you agree? In terms of compute, the actual indexing can happen "externally", maybe orchestrated by polaris cli rather than as a side effect of a snapshot update. In short, while I agree that we should coordinate planning and delegation service, I'd much rather implement the feature first and then build delegation service around it especially since there's both types of delegation requirement here (invoking external planner, notifying external indexer). Thanks, Tornike On Fri, Jun 19, 2026 at 2:12 AM Adnan Hemani via dev <[email protected]> wrote: > I agree with Yufei - I don't think we can implement something as heavy as > server-side planning directly onto Polaris as it stands. I think we need to > revisit the Delegation Service discussion; it would be a great place to > implement this type of functionality. > > Best, > Adnan Hemani > > On Wed, Jun 17, 2026 at 4:11 PM Yufei Gu <[email protected]> wrote: > > > Thanks for putting this together. The first phase sounds good to me. > > > > My main concern is that, without some form of delegation service, scan > > planning could easily become a heavy workload that impacts Polaris > > performance. > > > > The SQL pruning index is also a pretty big topic with a lot of design > > choices around ownership, consistency, updates, and operations. I'm not > > sure Polaris itself should be responsible for managing the index. > > > > One possible direction is to delegate scan planning and indexing to a > > separate service. That would keep Polaris focused on catalog and > governance > > responsibilities while still enabling these optimizations. In a way, that > > brings us back to the delegation service discussion. > > > > Curious what others think. > > > > Yufei > > > > > > On Tue, Jun 16, 2026 at 12:44 AM Tornike Gurgenidze < > > [email protected]> > > wrote: > > > > > Hi, > > > > > > I drafted a proposal regarding adding iceberg rest-compliant scan > > planning > > > support to Polaris. The proposal doc can be found here: > > > > > > > > > https://docs.google.com/document/d/1agpz4wwXxWfEy9fJLgPRDcrzdR5USM1i9vQhOBcHo3Q/edit?usp=sharing > > > > > > tldr: doc proposes to first add a straightforward implementation of > scan > > > planning in the initial phase and integrate new endpoints with polaris > > > authz. Subsequently, we can enhance scan planning performance with 2 > > > independent caching layers: > > > > > > - *CachingFileIO* - FileIO wrapper that wraps existing FileIO > > > implementations and introduces a configurable Caffeine-powered > > in-memory > > > cache to speed up access to manifest files. > > > - *SQL Pruning Index* - additional index stored in a rdbms and > > > asynchronously updated by polaris when a new table snapshot is > > > registered. > > > The goal is to store all relevant per-file stats in a db table that > > will > > > allow applying a pruning predicate in a single sql query. This is > > > essentially a ducklake-style index but used only as a file pruning > > index > > > rather than the source of truth. Index is allowed to lag behind the > > > latest > > > snapshot in which case ScanPlanner will use both index and > underlying > > > files > > > for the relevant parts of the table metadata. > > > > > > I have a POC for caching layers in a private repo which you can take a > > look > > > at as well: https://github.com/tokoko/iceberg-cache/. > > > > > > thanks, > > > Tornike > > > > > >
