Hi Adam, Working incrementally on this makes sense. I agree that handling internal Polaris workflows that deal with encrypted files sounds like a good starting point.
I wonder, though, if it is practically possible right now to produce an Iceberg table with encrypted data files so that Polaris could be tested in a realistic setting? Do you mean something like storing encrypted files directly from a client and later registering the table with Polaris? This is not a blocker for starting KMS work of course. I'm just trying to understand how much of that feature can be practically usable ATM. Cheers, Dmitri. On Tue, Jun 2, 2026 at 10:55 AM Adam Szita <[email protected]> wrote: > Thanks everyone, this helps clarify the discussion. > > I think we should separate two related but different topics: > > 1. KMS/Vault credential vending to clients via Iceberg REST. > 2. KMS configuration used by Polaris itself for server-side operations. > > I agree that #1 should be discussed on the Iceberg side and should not be > invented as Polaris-specific behavior. I’m also happy to participate in it > as I already have a working dev setup with REST client-side encryption > enabled, plus a POC for catalog-level KMS configuration. I can help > brainstorm/test concrete options but I do see this as a parallel > workstream. > > For Polaris, though, I think #2 will be needed regardless of the final REST > credential-vending implementation. > Iceberg table encryption is coming, and Polaris server-side operations that > read encrypted Iceberg artifacts will need KMS support. The immediate > example is drop table with purge / table cleanup, where Polaris reads > manifest lists and manifests to enumerate files for deletion. Those paths > will need an EncryptingFileIO initialized with catalog-level KMS > configuration. > > I also agree with the RFC that metadata integrity protection should be part > of the first Polaris effort, since metadata.json is not encrypted and > Polaris should detect out-of-band modification before trusting it for > encrypted tables. > > So my suggested first phase would be limited to: > > - catalog-level KMS configuration (separate from storage configuration) > - AWS KMS wiring for Polaris server-side operations > - metadata integrity checks for encrypted tables > > The current RFC seems structured around a broader end-to-end > table-encryption story (including client credential vending, key rotation, > governance/lifecycle topics, and general Iceberg encryption background). > Those are important, but I think it would be easier to make progress if we > first split out and design the narrower Polaris server-side building block > above, and discuss the broader pieces separately. > > Does that separation sound reasonable? > > Cheers, > Adam > > On Thu, 28 May 2026 at 03:26, Yufei Gu <[email protected]> wrote: > > > Thanks Adam for raising this. I think it's a great feature to have. > > > > Agreed on what Prashant said. We need some work on the IRC side to avoid > > any premature implementation in Polaris. > > > > Yufei > > > > > > On Wed, May 27, 2026 at 9:14 AM Prashant Singh via dev < > > [email protected]> wrote: > > > > > Hey Adam, > > > > > > Thanks for starting a thread on this in the Polaris community. > > > I believe we need a dedicated field in the loadTable response in IRC to > > > vend KMS credentials. Currently, KMS credentials are mixed with storage > > > credentials to achieve SSE, but there is no consistent way to enforce > > this > > > because the spec is silent about it. > > > With CSE (Iceberg v3 encryption), things get more involved because one > > can > > > use Vault with S3 as the combination of their KMS and ObjectStore. > > > Consequently, a catalog cannot provide access to both as part of > > loadTable > > > response, my take here is if catalog is giving access to a caller > > > If the catalog grants access to a caller because it has SELECT > privilege > > it > > > should provide access to both KMS and Storage. > > > > > > I have an open thread in the *Iceberg community* [1] . Let's conclude > > there > > > what the IRC response should look like after consulting with the > broader > > > Iceberg catalog community (I added REST catalog encryption support in > the > > > last catalog community sync agenda but we ran out of time [2]), and > then > > we > > > can circle back in the Polaris community to see what would looks like > to > > > support here. > > > > > > Best, > > > Prashant > > > > > > [1] https://lists.apache.org/thread/z48t5wgx778j17pzto9kqxwysw4ysxxo > > > [2] > > > > > > > > > https://docs.google.com/document/d/1iPGVCIcr-M0XtAiudOguWAvmqIdVgpYN5vz5ohO8PKw/edit?tab=t.0#heading=h.cr6o1g2rn5hc > > > > > > On Wed, May 27, 2026 at 8:38 AM Alexandre Dutra <[email protected]> > > wrote: > > > > > > > Hi Adam, hi all, > > > > > > > > I did some archaeology on this topic and (unless I'm reading this > > > > wrong) it seems there is some previous work on this topic by Anand > > > > Sankaran. He sent his proposal to the Polaris dev mailing list in > > > > February [1] and wrote a design doc: [2]. Yufei also opened an issue > a > > > > while ago: [3]. > > > > > > > > I think that the best next step would be to revive Anand's design doc > > > > and see if it aligns with what you have in mind. > > > > > > > > I agree that this feature should be prioritized as it is extremely > > > > useful for users running on untrusted storage providers. However, if > I > > > > understand the situation correctly, it seems that on the Iceberg side > > > > the feature is already in the REST spec, but client-side support is > > > > still pending [4] – it's been under review for a year. Is that > > > > assessment correct? (If so, this would be a good candidate for a > > > > feature branch on our side, while we wait for the 1.12 release to > > > > land.) > > > > > > > > Thanks, > > > > Alex > > > > > > > > [1]: > https://lists.apache.org/thread/mpg46o0w2bzy75hyhx2j74dgwzjh2ob7 > > > > [2]: > > > > > > > > > > https://docs.google.com/document/d/1f4Mgg5W1t4NT6R7KLq5K3S4pHlAwYwXTFwUR9uNNpSU/edit?tab=t.0#heading=h.7ucqpo88io4u > > > > [3]: https://github.com/apache/polaris/issues/2829 > > > > [4]: https://github.com/apache/iceberg/pull/13225 > > > > > > > > On Wed, May 27, 2026 at 10:55 AM Adam Szita <[email protected]> > wrote: > > > > > > > > > > Thanks for your replies Dmitri and JB, > > > > > > > > > > IIUC, the KMS integration you’re referring to is closely tied to > AWS > > S3 > > > > > storage. It is storage-layer encryption at rest: Polaris can record > > AWS > > > > KMS > > > > > key ARNs in the S3 storage configuration, and during storage > > credential > > > > > vending it grants the vended AWS credentials the required KMS > > > permissions > > > > > such as decrypt/encrypt/data-key operations. That lets clients > > > read/write > > > > > SSE-KMS encrypted S3 objects, but it is still a low-level storage > > > concern > > > > > and does not know whether the object is an Iceberg data file, > > manifest, > > > > or > > > > > anything else. > > > > > > > > > > Iceberg table encryption is different. It is one abstraction level > > > higher > > > > > and is table-format aware: > > > > > > > > > > - under the hood an EncryptingFileIO is used to access encrypted > > > > > artifacts > > > > > - it uses envelope encryption to encrypt data files, manifest > > files > > > > and > > > > > snapshot files, defining a master table key to be managed in a > KMS > > > > (for > > > > > some more context: https://www.youtube.com/watch?v=G7Y2eNS_d-s) > > > > > - table metadata carries encryption metadata and key > references; a > > > > > KMS-backed `KeyManagementClient` wraps/unwraps the keys. > > > > > - it provides better portability of encrypted tables, it's > vendor > > > > > independent - in theory you could have a combination of S3 > storage > > > > with GCP > > > > > KMS, or even a custom KMS client implementation should > enterprise > > > > users > > > > > favor that > > > > > - supporting catalogs would have to bear additional > > responsibilities > > > > > such as protecting metadata integrity and preventing master > > > > encryption key > > > > > changes (which is an Iceberg table property) > > > > > > > > > > The catalog-level KMS config I’m proposing is for Iceberg table > > > > encryption, > > > > > not for S3 SSE-KMS. It also shouldn't be modeled as storage > > > configuration > > > > > because the storage backend and table-encryption KMS provider do > not > > > have > > > > > to match, perhaps we could use a more concrete naming such > > > > > as icebergTableEncryptionKmsConfigInfo to avoid confusion. > > > > > In any case I'm happy to draft a design doc and share it here. > > > > > > > > > > Cheers, > > > > > Adam > > > > > > > > > > > > > > > > > > > > On Wed, 27 May 2026 at 08:07, Jean-Baptiste Onofré < > [email protected]> > > > > wrote: > > > > > > > > > > > Hi Adam, > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > I share Dmitri's question; my understanding is that this pertains > > to > > > > > > client-side encryption. I can confirm that KMS should work, as I > > > > recall an > > > > > > issue regarding this being fixed in the past. > > > > > > > > > > > > Adam, could you please clarify the scope of this work? > > > > > > > > > > > > Regards, > > > > > > JB > > > > > > > > > > > > > > > > > > On Tue, May 26, 2026 at 8:01 PM Dmitri Bourlatchkov < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi Adam, > > > > > > > > > > > > > > Thanks for this proposal! > > > > > > > > > > > > > > Polaris should already support storage-side KMS in AWS (and > > > > compatible > > > > > > > systems) via [2802] (cf. [1]). > > > > > > > > > > > > > > I guess the new features you mention relate to client-side > > > > encryption, > > > > > > > right? > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > https://polaris.apache.org/blog/2025/12/24/securing-s3-data-with-aws-kms/ > > > > > > > > > > > > > > [2802] https://github.com/apache/polaris/pull/2802 > > > > > > > > > > > > > > Cheers, > > > > > > > Dmitri. > > > > > > > > > > > > > > On Tue, May 26, 2026 at 11:06 AM Adam Szita <[email protected]> > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > Iceberg 1.11 shipped the base implementation for table > > > encryption, > > > > > > > > including KMS-based key wrapping/unwrapping and encrypted > > > > data/delete, > > > > > > > > manifest, and manifest-list files. REST catalog support is > also > > > > being > > > > > > > > worked on in Iceberg (see > > > > https://github.com/apache/iceberg/pull/13225 > > > > > > ). > > > > > > > > > > > > > > > > I have been testing Polaris with Iceberg REST client-side > > > > encryption > > > > > > > > enabled. Basic catalog operations such as loadTable, > > commit/drop > > > > > > without > > > > > > > > purge, list, etc. work without Polaris changes because > Polaris > > > only > > > > > > needs > > > > > > > > the table metadata JSON for those paths, and metadata.json is > > not > > > > > > > > encrypted. > > > > > > > > > > > > > > > > The places where Polaris does need encryption awareness are > the > > > > > > > server-side > > > > > > > > paths that read encrypted Iceberg artifacts. The first > concrete > > > > example > > > > > > > is > > > > > > > > drop table with purge: TableCleanupTask reads snapshot > manifest > > > > lists > > > > > > and > > > > > > > > manifests to enumerate files for deletion, so it needs to use > > an > > > > > > > > EncryptingFileIO. The same would apply to any Polaris-side > > table > > > > > > > > maintenance/optimization, orphan/snapshot cleanup logic, or > any > > > > future > > > > > > > > remote scan/planning capability that reads manifests or > > > data/delete > > > > > > > files. > > > > > > > > > > > > > > > > There is also a related but separate topic around vending KMS > > > > > > credentials > > > > > > > > to clients. That likely needs Iceberg REST spec work first, > > > > similar in > > > > > > > > spirit to current storage credential vending, so I think it > > > should > > > > be > > > > > > > > designed for but not required as the first Polaris step. > > > > > > > > > > > > > > > > The first Polaris-side building block I would propose is to > > allow > > > > > > Iceberg > > > > > > > > catalogs to carry KMS configuration, similarly to how > catalogs > > > > > > currently > > > > > > > > carry StorageConfigurationInfo. This should be separate from > > > > storage > > > > > > > > configuration because the storage backend and KMS provider > may > > > > differ, > > > > > > > for > > > > > > > > example GCS storage with AWS KMS. AWS KMS would be a > reasonable > > > > first > > > > > > > > implementation target, using Iceberg’s existing > > > > KeyManagementClient/AWS > > > > > > > KMS > > > > > > > > support, while leaving the model extensible for Azure and > GCP. > > > > > > > > > > > > > > > > I have already been experimenting with this locally and would > > be > > > > happy > > > > > > to > > > > > > > > work on the Polaris changes. A possible first PR could be > > limited > > > > to: > > > > > > > > > > > > > > > > 1. Add catalog-level KMS configuration model/API support. > > > > > > > > 2. Add AWS KMS server-side configuration wiring. > > > > > > > > > > > > > > > > Any feedback is welcome. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Adam > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
