Hi all, While looking into how “purge table”, or any other Iceberg table/view related operation, can be integrated into the tasks handling proposals, it became apparent that retrieving an Iceberg `FileIO` instance requires a bunch of operations.
For posterity: the async-reliable-tasks proposal [1] ensures that tasks are eventually run. The delegation service proposal [2] adds the idea to execute tasks in a different service, so not on Polaris nodes/pods. Obviously, (table related) task executions need the storage location and configuration & credentials to access it. The following is a (shortened) breakdown of what happens, when storage credentials are retrieved. The implementation of `TaskFileIOSupplier` is supposed to provide the `FileIO` for a task (think: “purge table files”). To achieve this, the Polaris code base performs the following series of operations: 1. Construct a `ResolvedPolarisEntity` with the table entity, without catalog and namespaces entities (*) 2. Call `DefaultFileIOFactory.loadFileIO` (with the `ResolvedPolarisEntity` as a `PolarisResolvedPathWrapper`) 3. `DefaultFileIOFactory.loadFileIO` gets the Polaris entity having the "right" storage configuration from the entity, parent namespace(s) or catalog from the `PolarisResolvedPathWrapper`. 4. Then call `StorageCredentialCache.getOrGenerateSubScopeCreds`, which... 5. ... calls `PolarisCredentialVendor.getSubscopedCredsForEntity`, that has to be implemented by all `PolarisMetaStoreManager` implementations, which ... 6. ... load the Polaris table entity (the dropped one?) to then ... 7. ... call `loadPolarisStorageIntegration`, which 8. ... call `PolarisStorageIntegrationProvider.getStorageIntegrationForConfig` to ... 9. ... get the (subscoped) credentials from the returned `PolarisStorageIntegration` (for S3, GCS, Azure). (*): This doesn't need the namespaces and catalog as the Iceberg drop-table API (IcebergCatalog.dropTable) implementation extracts the resolved storage-configuration to be eventually passed to `PolarisMetaStoreManager.dropEntityIfExists`, which then eventually end in the task entity. The above also applies to every operation that accesses the object storage, like an Iceberg 'loadTable' operation (minus the tasks specific things). `IntegrationPersistence.loadPolarisStorageIntegration` does not do anything persistence related (no "loading" or so) in any of the implementations. All "counterpart" implementations of `persistStorageIntegrationIfNeeded` do nothing. Operations that need a `FileIO` instance perform an additional `loadEntity` call, which always hits the backend database. The result of that `loadEntity` (step #6) yields the same or a different state (concurrent table update) or no entity (concurrent drop table). The “state” of the cached credentials may also refer to an older state of the table. As constructions of a `FileIO` instance add database roundtrips, it also affects the runtime of all Iceberg table API operations. This is IMHO not necessary, because the storage configuration is already available much earlier, so the database roundtrip is not necessary. This leads to the "Remove noop code in persistence" PR [3], which simplifies the above breakdown, removes the unnecessary database round trip and decouples the storage concern to retrieve credentials from persistence concerns. A (WIP-PR) follow-up [4] illustrates the full decoupling, especially for the `StorageCredentialsCache[Key]`. Not having to have access to persistence when a storage related task runs is quite beneficial for tasks that run minutes/hours/days after being created, on separate machines or even in a different (delegation) service. The goal of [3] and follow-ups is to simplify the code base, eliminate complexity and simplify the flow. As a result it also removes unnecessary request and runtime penalties. There was a concern raised about existing custom implementations [5] relying on internal details of the implementation. Ideally custom implementation should only rely on the SPI’s defined by the project and not necessarily details of the implementation. I would like to propose that we take this opportunity to define what is the SPI for storage/persistence. Thoughts? Robert [1] https://lists.apache.org/thread/n6r5lysjdjlgclbdk9rb7m4bqr7jnsv4 [2] https://lists.apache.org/thread/32ypmhn5wnvmltwpl4pydoxgg58xnzhs [3] https://github.com/apache/polaris/pull/2277 [4] https://github.com/apache/polaris/pull/2278 [5] https://github.com/apache/polaris/pull/2277#pullrequestreview-3099564976