Hi Gidon, I would like to join the meeting.
Thanks, Maya Anderson ---------- Forwarded message --------- > From: Gidon Gershinsky <gg5...@gmail.com> > Date: Wed, Sep 1, 2021 at 9:11 PM > Subject: Fwd: Data encryption in Iceberg > To: <dev@iceberg.apache.org> > > > Hi all, > > Per the sync this morning, we'll have a meeting on encryption-related > efforts in Iceberg. Before we discuss the day/time options, let us know > who's interested to join, please respond here or send a direct message to > Ryan, Jack or myself. > > Cheers, Gidon > > > ---------- Forwarded message --------- > From: Gidon Gershinsky <gg5...@gmail.com> > Date: Mon, Aug 30, 2021 at 5:57 PM > Subject: Re: Data encryption in Iceberg > To: <dev@iceberg.apache.org> > > > Hi Jack, > > Thank you. We've been indeed busy with building the Iceberg data > encryption code, since we have quite a demand for this functionality (with > timeline requirements..). > I've published an initial end-to-end implementation (PR 3053), comprised > of a new code that handles the generation of data keys, and of the existing > code (with some modifications) from the current PRs listed below (so this > is a joint work, with contributions from both of us; I'm sure there are > ways to recognize PR co-authorship :). > > As I mentioned, this is the simplest version (without double wrapping, > column-specific master keys and two-tier key management). I got a prototype > for these advanced data encryption features, but thought it might be best > to start with an MVP - easier to digest by the community, and allows for a > gradual layer-by-layer implementation. In my understanding, MVP can start > without key rotation - because the latter has two parts, with the main one > (key rotation in KMS) being totally transparent to Iceberg; the other part > (re-wrapping of key_metadata and re-writing of manifest files and manifest > lists) is required in threat models that cover a risk of master keys being > compromised/leaked - so this is a less universal requirement and can be > added post-MVP. But if you hold a different view on this, or need the > second part of key rotation now, I'm sure this is doable; I just hope it > won't slow down the MVP work. > > Having said that - there is a feature I believe would be a really good > addition to the MVP. This is the encryption of manifests and manifest > lists. I presume you refer to it in your mail. If you have an internal > branch with its implementation - porting this to open source will be much > appreciated. We need this capability (yes, the data is encrypted; but the > stats are not.. which is not great, even if they actually are highly > aggregated, a sort of a range mask). > > We can chat about this at the upcoming sync, but I support the suggestion > to set up a more detailed discussion to align the encryption-related > efforts. > > Cheers, Gidon > > > On Sun, Aug 29, 2021 at 11:08 PM Jack Ye <yezhao...@gmail.com> wrote: > >> Hi Gidon and Huaxin, >> >> Thanks for continuing with the effort in Iceberg encryption support. I >> did not get enough time to work on this area since the design discussion, >> so far I only managed to add key metadata for manifest file, and there are >> quite a few changes in our internal branch that I need to port to open >> source. I will start to do it in the next few days. >> >> Regarding the design, I wonder if we should first start with defining the >> actions API with a Spark implementation for file encryption key rotation, >> and then discuss the user experience. >> >> In the original design document, I think we did not reach a consensus >> with the community around the actual way to expose key rotation >> functionalities. In Spark, we can either do it through DDL extension, or >> implement it as a procedure. Given that this is a long-running distributed >> procedure, my feeling is that the community will lean towards a procedure >> call. >> >> We can continue with the discussion around this while first doing the >> detailed implementation. Let's set up a discussion around this so that we >> can align the efforts. >> >> Best, >> Jack Ye >> >> >> On Wed, Aug 25, 2021 at 4:19 AM Gidon Gershinsky <gg5...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> We have briefly discussed this subject in a June sync, with a >>> decision to continue via the mailing list. >>> There are a number of pull requests from Jack and myself that implement >>> a set of disjoint elements from the high-level design >>> <https://docs.google.com/document/d/1kkcjr9KrlB9QagRX3ToulG_Rf-65NMSlVANheDNzJq4/edit?usp=sharing>. >>> Some low-level details, such as generation and propagation of data keys, >>> are not covered in this document. >>> I have created a short (and hopefully simple) doc >>> >>> https://docs.google.com/document/d/19O_qiQumz_66CdWLpw38GFJEsUpnNxXckP9rnYIQnCo/edit?usp=sharing >>> that focuses on these details and describes the bottom-up approach to >>> generation of data keys, encryption of data/delete files, and >>> options/phases for optimization of key management. The scope of the >>> document is intentionally narrow, and currently focuses on the minimal >>> simplest option. Reviews are very welcome. Later, this doc will be merged >>> in (or referenced from) the master design document. >>> >>> A PR with a basic encryption DDL has been sent recently by Huaxin, you >>> can find it here <https://github.com/apache/iceberg/pull/3013>. Next >>> week, I'll send a pull request with an implementation of the minimal >>> encryption option. This pull request collects the basics from my PRs 2639, >>> 2638, 2640 and Jack's PR 2443; adding the key generation and other code >>> that creates an end-to-end implementation of the minimal design >>> <https://docs.google.com/document/d/19O_qiQumz_66CdWLpw38GFJEsUpnNxXckP9rnYIQnCo/edit?usp=sharing>. >>> This PR comes with an example proposed by Ryan - using a table encryption >>> key from a keyfile ("pkcs12" format - the closest thing to the "pem" format >>> for symmetric keys). >>> Besides the minimal version, I have a draft implementation of more >>> advanced data encryption options (including per-column keys, double >>> wrapping and two-tier management - all described in the master design doc) >>> - but let's take this one step at a time, starting with the simplest option. >>> >>> Cheers, Gidon >>> >> -- Regards, Maya