Hi all,

While looking into how “purge table”, or any other Iceberg table/view
related operation, can be integrated into the tasks handling
proposals, it became apparent that retrieving an Iceberg `FileIO`
instance requires a bunch of operations.

For posterity: the async-reliable-tasks proposal [1] ensures that
tasks are eventually run. The delegation service proposal [2] adds the
idea to execute tasks in a different service, so not on Polaris
nodes/pods. Obviously, (table related) task executions need the
storage location and configuration & credentials to access it.

The following is a (shortened) breakdown of what happens, when storage
credentials are retrieved.

The implementation of `TaskFileIOSupplier` is supposed to provide the
`FileIO` for a task (think: “purge table files”). To achieve this, the
Polaris code base performs the following series of operations:

1. Construct a `ResolvedPolarisEntity` with the table entity, without
catalog and namespaces entities (*)
2. Call `DefaultFileIOFactory.loadFileIO` (with the
`ResolvedPolarisEntity` as a `PolarisResolvedPathWrapper`)
3. `DefaultFileIOFactory.loadFileIO` gets the Polaris entity having
the "right" storage configuration from the entity, parent namespace(s)
or catalog from the `PolarisResolvedPathWrapper`.
4. Then call `StorageCredentialCache.getOrGenerateSubScopeCreds`, which...
5. ... calls `PolarisCredentialVendor.getSubscopedCredsForEntity`,
that has to be implemented by all `PolarisMetaStoreManager`
implementations, which ...
6. ... load the Polaris table entity (the dropped one?) to then ...
7. ... call `loadPolarisStorageIntegration`, which
8. ... call `PolarisStorageIntegrationProvider.getStorageIntegrationForConfig`
to ...
9. ... get the (subscoped) credentials from the returned
`PolarisStorageIntegration` (for S3, GCS, Azure).


(*): This doesn't need the namespaces and catalog as the Iceberg
drop-table API (IcebergCatalog.dropTable) implementation extracts the
resolved storage-configuration to be eventually passed to
`PolarisMetaStoreManager.dropEntityIfExists`, which then eventually
end in the task entity.

The above also applies to every operation that accesses the object
storage, like an Iceberg 'loadTable' operation (minus the tasks
specific things).


`IntegrationPersistence.loadPolarisStorageIntegration` does not do
anything persistence related (no "loading" or so) in any of the
implementations. All "counterpart" implementations of
`persistStorageIntegrationIfNeeded` do nothing.

Operations that need a `FileIO` instance perform an additional
`loadEntity` call, which always hits the backend database. The result
of that `loadEntity` (step #6) yields the same or a different state
(concurrent table update) or no entity (concurrent drop table). The
“state” of the cached credentials may also refer to an older state of
the table.

As constructions of a `FileIO` instance add database roundtrips, it
also affects the runtime of all Iceberg table API operations. This is
IMHO not necessary, because the storage configuration is already
available much earlier, so the database roundtrip is not necessary.

This leads to the "Remove noop code in persistence" PR [3], which
simplifies the above breakdown, removes the unnecessary database round
trip and decouples the storage concern to retrieve credentials from
persistence concerns. A (WIP-PR) follow-up [4] illustrates the full
decoupling, especially for the `StorageCredentialsCache[Key]`.

Not having to have access to persistence when a storage related task
runs is quite beneficial for tasks that run minutes/hours/days after
being created, on separate machines or even in a different
(delegation) service.


The goal of [3] and follow-ups is to simplify the code base, eliminate
complexity and simplify the flow. As a result it also removes
unnecessary request and runtime penalties. There was a concern raised
about existing custom implementations [5] relying on internal details
of the implementation. Ideally custom implementation should only rely
on the SPI’s defined by the project and not necessarily details of the
implementation. I would like to propose that we take this opportunity
to define what is the SPI for storage/persistence.


Thoughts?

Robert


[1] https://lists.apache.org/thread/n6r5lysjdjlgclbdk9rb7m4bqr7jnsv4
[2] https://lists.apache.org/thread/32ypmhn5wnvmltwpl4pydoxgg58xnzhs
[3] https://github.com/apache/polaris/pull/2277
[4] https://github.com/apache/polaris/pull/2278
[5] https://github.com/apache/polaris/pull/2277#pullrequestreview-3099564976

Reply via email to