> if it is practically possible right now to produce an Iceberg table with encrypted data files so that Polaris could be tested in a realistic setting?
Yes with the caveat that certain operations are not possible as we discussed, like drop-by-purge and future scan planning. Yufei On Tue, Jun 9, 2026 at 8:53 AM Dmitri Bourlatchkov <[email protected]> wrote: > Hi Adam, > > Working incrementally on this makes sense. I agree that handling internal > Polaris workflows that deal with encrypted files sounds like a good > starting point. > > I wonder, though, if it is practically possible right now to produce an > Iceberg table with encrypted data files so that Polaris could be tested in > a realistic setting? Do you mean something like storing encrypted files > directly from a client and later registering the table with Polaris? This > is not a blocker for starting KMS work of course. I'm just trying to > understand how much of that feature can be practically usable ATM. > > Cheers, > Dmitri. > > On Tue, Jun 2, 2026 at 10:55 AM Adam Szita <[email protected]> wrote: > > > Thanks everyone, this helps clarify the discussion. > > > > I think we should separate two related but different topics: > > > > 1. KMS/Vault credential vending to clients via Iceberg REST. > > 2. KMS configuration used by Polaris itself for server-side > operations. > > > > I agree that #1 should be discussed on the Iceberg side and should not be > > invented as Polaris-specific behavior. I’m also happy to participate in > it > > as I already have a working dev setup with REST client-side encryption > > enabled, plus a POC for catalog-level KMS configuration. I can help > > brainstorm/test concrete options but I do see this as a parallel > > workstream. > > > > For Polaris, though, I think #2 will be needed regardless of the final > REST > > credential-vending implementation. > > Iceberg table encryption is coming, and Polaris server-side operations > that > > read encrypted Iceberg artifacts will need KMS support. The immediate > > example is drop table with purge / table cleanup, where Polaris reads > > manifest lists and manifests to enumerate files for deletion. Those paths > > will need an EncryptingFileIO initialized with catalog-level KMS > > configuration. > > > > I also agree with the RFC that metadata integrity protection should be > part > > of the first Polaris effort, since metadata.json is not encrypted and > > Polaris should detect out-of-band modification before trusting it for > > encrypted tables. > > > > So my suggested first phase would be limited to: > > > > - catalog-level KMS configuration (separate from storage > configuration) > > - AWS KMS wiring for Polaris server-side operations > > - metadata integrity checks for encrypted tables > > > > The current RFC seems structured around a broader end-to-end > > table-encryption story (including client credential vending, key > rotation, > > governance/lifecycle topics, and general Iceberg encryption background). > > Those are important, but I think it would be easier to make progress if > we > > first split out and design the narrower Polaris server-side building > block > > above, and discuss the broader pieces separately. > > > > Does that separation sound reasonable? > > > > Cheers, > > Adam > > > > On Thu, 28 May 2026 at 03:26, Yufei Gu <[email protected]> wrote: > > > > > Thanks Adam for raising this. I think it's a great feature to have. > > > > > > Agreed on what Prashant said. We need some work on the IRC side to > avoid > > > any premature implementation in Polaris. > > > > > > Yufei > > > > > > > > > On Wed, May 27, 2026 at 9:14 AM Prashant Singh via dev < > > > [email protected]> wrote: > > > > > > > Hey Adam, > > > > > > > > Thanks for starting a thread on this in the Polaris community. > > > > I believe we need a dedicated field in the loadTable response in IRC > to > > > > vend KMS credentials. Currently, KMS credentials are mixed with > storage > > > > credentials to achieve SSE, but there is no consistent way to enforce > > > this > > > > because the spec is silent about it. > > > > With CSE (Iceberg v3 encryption), things get more involved because > one > > > can > > > > use Vault with S3 as the combination of their KMS and ObjectStore. > > > > Consequently, a catalog cannot provide access to both as part of > > > loadTable > > > > response, my take here is if catalog is giving access to a caller > > > > If the catalog grants access to a caller because it has SELECT > > privilege > > > it > > > > should provide access to both KMS and Storage. > > > > > > > > I have an open thread in the *Iceberg community* [1] . Let's conclude > > > there > > > > what the IRC response should look like after consulting with the > > broader > > > > Iceberg catalog community (I added REST catalog encryption support in > > the > > > > last catalog community sync agenda but we ran out of time [2]), and > > then > > > we > > > > can circle back in the Polaris community to see what would looks like > > to > > > > support here. > > > > > > > > Best, > > > > Prashant > > > > > > > > [1] https://lists.apache.org/thread/z48t5wgx778j17pzto9kqxwysw4ysxxo > > > > [2] > > > > > > > > > > > > > > https://docs.google.com/document/d/1iPGVCIcr-M0XtAiudOguWAvmqIdVgpYN5vz5ohO8PKw/edit?tab=t.0#heading=h.cr6o1g2rn5hc > > > > > > > > On Wed, May 27, 2026 at 8:38 AM Alexandre Dutra <[email protected]> > > > wrote: > > > > > > > > > Hi Adam, hi all, > > > > > > > > > > I did some archaeology on this topic and (unless I'm reading this > > > > > wrong) it seems there is some previous work on this topic by Anand > > > > > Sankaran. He sent his proposal to the Polaris dev mailing list in > > > > > February [1] and wrote a design doc: [2]. Yufei also opened an > issue > > a > > > > > while ago: [3]. > > > > > > > > > > I think that the best next step would be to revive Anand's design > doc > > > > > and see if it aligns with what you have in mind. > > > > > > > > > > I agree that this feature should be prioritized as it is extremely > > > > > useful for users running on untrusted storage providers. However, > if > > I > > > > > understand the situation correctly, it seems that on the Iceberg > side > > > > > the feature is already in the REST spec, but client-side support is > > > > > still pending [4] – it's been under review for a year. Is that > > > > > assessment correct? (If so, this would be a good candidate for a > > > > > feature branch on our side, while we wait for the 1.12 release to > > > > > land.) > > > > > > > > > > Thanks, > > > > > Alex > > > > > > > > > > [1]: > > https://lists.apache.org/thread/mpg46o0w2bzy75hyhx2j74dgwzjh2ob7 > > > > > [2]: > > > > > > > > > > > > > > > https://docs.google.com/document/d/1f4Mgg5W1t4NT6R7KLq5K3S4pHlAwYwXTFwUR9uNNpSU/edit?tab=t.0#heading=h.7ucqpo88io4u > > > > > [3]: https://github.com/apache/polaris/issues/2829 > > > > > [4]: https://github.com/apache/iceberg/pull/13225 > > > > > > > > > > On Wed, May 27, 2026 at 10:55 AM Adam Szita <[email protected]> > > wrote: > > > > > > > > > > > > Thanks for your replies Dmitri and JB, > > > > > > > > > > > > IIUC, the KMS integration you’re referring to is closely tied to > > AWS > > > S3 > > > > > > storage. It is storage-layer encryption at rest: Polaris can > record > > > AWS > > > > > KMS > > > > > > key ARNs in the S3 storage configuration, and during storage > > > credential > > > > > > vending it grants the vended AWS credentials the required KMS > > > > permissions > > > > > > such as decrypt/encrypt/data-key operations. That lets clients > > > > read/write > > > > > > SSE-KMS encrypted S3 objects, but it is still a low-level storage > > > > concern > > > > > > and does not know whether the object is an Iceberg data file, > > > manifest, > > > > > or > > > > > > anything else. > > > > > > > > > > > > Iceberg table encryption is different. It is one abstraction > level > > > > higher > > > > > > and is table-format aware: > > > > > > > > > > > > - under the hood an EncryptingFileIO is used to access > encrypted > > > > > > artifacts > > > > > > - it uses envelope encryption to encrypt data files, manifest > > > files > > > > > and > > > > > > snapshot files, defining a master table key to be managed in a > > KMS > > > > > (for > > > > > > some more context: > https://www.youtube.com/watch?v=G7Y2eNS_d-s) > > > > > > - table metadata carries encryption metadata and key > > references; a > > > > > > KMS-backed `KeyManagementClient` wraps/unwraps the keys. > > > > > > - it provides better portability of encrypted tables, it's > > vendor > > > > > > independent - in theory you could have a combination of S3 > > storage > > > > > with GCP > > > > > > KMS, or even a custom KMS client implementation should > > enterprise > > > > > users > > > > > > favor that > > > > > > - supporting catalogs would have to bear additional > > > responsibilities > > > > > > such as protecting metadata integrity and preventing master > > > > > encryption key > > > > > > changes (which is an Iceberg table property) > > > > > > > > > > > > The catalog-level KMS config I’m proposing is for Iceberg table > > > > > encryption, > > > > > > not for S3 SSE-KMS. It also shouldn't be modeled as storage > > > > configuration > > > > > > because the storage backend and table-encryption KMS provider do > > not > > > > have > > > > > > to match, perhaps we could use a more concrete naming such > > > > > > as icebergTableEncryptionKmsConfigInfo to avoid confusion. > > > > > > In any case I'm happy to draft a design doc and share it here. > > > > > > > > > > > > Cheers, > > > > > > Adam > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 27 May 2026 at 08:07, Jean-Baptiste Onofré < > > [email protected]> > > > > > wrote: > > > > > > > > > > > > > Hi Adam, > > > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > > > I share Dmitri's question; my understanding is that this > pertains > > > to > > > > > > > client-side encryption. I can confirm that KMS should work, as > I > > > > > recall an > > > > > > > issue regarding this being fixed in the past. > > > > > > > > > > > > > > Adam, could you please clarify the scope of this work? > > > > > > > > > > > > > > Regards, > > > > > > > JB > > > > > > > > > > > > > > > > > > > > > On Tue, May 26, 2026 at 8:01 PM Dmitri Bourlatchkov < > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Adam, > > > > > > > > > > > > > > > > Thanks for this proposal! > > > > > > > > > > > > > > > > Polaris should already support storage-side KMS in AWS (and > > > > > compatible > > > > > > > > systems) via [2802] (cf. [1]). > > > > > > > > > > > > > > > > I guess the new features you mention relate to client-side > > > > > encryption, > > > > > > > > right? > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://polaris.apache.org/blog/2025/12/24/securing-s3-data-with-aws-kms/ > > > > > > > > > > > > > > > > [2802] https://github.com/apache/polaris/pull/2802 > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > On Tue, May 26, 2026 at 11:06 AM Adam Szita < > [email protected]> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > Iceberg 1.11 shipped the base implementation for table > > > > encryption, > > > > > > > > > including KMS-based key wrapping/unwrapping and encrypted > > > > > data/delete, > > > > > > > > > manifest, and manifest-list files. REST catalog support is > > also > > > > > being > > > > > > > > > worked on in Iceberg (see > > > > > https://github.com/apache/iceberg/pull/13225 > > > > > > > ). > > > > > > > > > > > > > > > > > > I have been testing Polaris with Iceberg REST client-side > > > > > encryption > > > > > > > > > enabled. Basic catalog operations such as loadTable, > > > commit/drop > > > > > > > without > > > > > > > > > purge, list, etc. work without Polaris changes because > > Polaris > > > > only > > > > > > > needs > > > > > > > > > the table metadata JSON for those paths, and metadata.json > is > > > not > > > > > > > > > encrypted. > > > > > > > > > > > > > > > > > > The places where Polaris does need encryption awareness are > > the > > > > > > > > server-side > > > > > > > > > paths that read encrypted Iceberg artifacts. The first > > concrete > > > > > example > > > > > > > > is > > > > > > > > > drop table with purge: TableCleanupTask reads snapshot > > manifest > > > > > lists > > > > > > > and > > > > > > > > > manifests to enumerate files for deletion, so it needs to > use > > > an > > > > > > > > > EncryptingFileIO. The same would apply to any Polaris-side > > > table > > > > > > > > > maintenance/optimization, orphan/snapshot cleanup logic, or > > any > > > > > future > > > > > > > > > remote scan/planning capability that reads manifests or > > > > data/delete > > > > > > > > files. > > > > > > > > > > > > > > > > > > There is also a related but separate topic around vending > KMS > > > > > > > credentials > > > > > > > > > to clients. That likely needs Iceberg REST spec work first, > > > > > similar in > > > > > > > > > spirit to current storage credential vending, so I think it > > > > should > > > > > be > > > > > > > > > designed for but not required as the first Polaris step. > > > > > > > > > > > > > > > > > > The first Polaris-side building block I would propose is to > > > allow > > > > > > > Iceberg > > > > > > > > > catalogs to carry KMS configuration, similarly to how > > catalogs > > > > > > > currently > > > > > > > > > carry StorageConfigurationInfo. This should be separate > from > > > > > storage > > > > > > > > > configuration because the storage backend and KMS provider > > may > > > > > differ, > > > > > > > > for > > > > > > > > > example GCS storage with AWS KMS. AWS KMS would be a > > reasonable > > > > > first > > > > > > > > > implementation target, using Iceberg’s existing > > > > > KeyManagementClient/AWS > > > > > > > > KMS > > > > > > > > > support, while leaving the model extensible for Azure and > > GCP. > > > > > > > > > > > > > > > > > > I have already been experimenting with this locally and > would > > > be > > > > > happy > > > > > > > to > > > > > > > > > work on the Polaris changes. A possible first PR could be > > > limited > > > > > to: > > > > > > > > > > > > > > > > > > 1. Add catalog-level KMS configuration model/API support. > > > > > > > > > 2. Add AWS KMS server-side configuration wiring. > > > > > > > > > > > > > > > > > > Any feedback is welcome. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Adam > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
