johnclara opened a new pull request #1918:
URL: https://github.com/apache/iceberg/pull/1918


   ### Goal:
   Share maintenance of logic in the community for doing symmetric key 
encryption.
   
   Eventually this can be extended with specific dek providers (eg: aws 
specific implementations for aws KMS in iceberg-aws) and CryptoFormats (eg AES 
GCM/CTR similar to the s3 encrypted client from aws java sdk 1).
   
   ### High level idea:
   A Table object will be configured with a KekId for encrypting new data files.
   
   On write:
   The table will use this KekId to reach out to a DekProvider which supports 
the KekId and retrieve a Dek with an encrypted and plaintext version.
   
   After using the plaintext dek version to encrypt the data file, it will save 
the encrypted version and the KekId used to generate the dek inside of 
EncryptionKeyMetadata.
   
   On read:
   The table will load the kekId and encrypted dek from the 
EncryptionKeyMetadata and reach out to the DekProvider to retrieve the 
plaintext version. Then it will use the plaintext version to unencrypt the file.
   
   ### This Implementation
   #### Loading and Dumping
   The table already needs to be initialized with a KekId, it needs to be 
loadable from a Map<String, String> of properties (either in the table 
properties or catalog properties).
   
   For this reason, I'm also storing the KekId in the EncryptionKeyMetadata as 
a Map<String, String> and relying on each DekProvider to load it out. What do 
you all think about this? Is this a terrible idea?
   
   For legacy reasons on my team's side, the Dek is also loaded in the same way 
but this should probably be split up.
   
   #### Dek Providers
   
   **KmsDekProvider:** This will retrieve a dek from an external key store. 
This is written with AWS KMS in mind (This PR was motivated by trying to switch 
to aws java sdk2 in order to use the iceberg-aws libraries.) This 
implementation should also work for GCP KMS.
   
   **PlaintextDekProvider:** This won't actually encrypt the dek when it is 
stored inside the encryptionKeyMetadata. This is for users who encrypt 
manifests.
   
   
   **DelegatingDekProvider:**
   Some Catalogs will support multiple types of dek providers and some tables 
might transition from one dek provider to another. The DelegatingDekProvider 
will have a field for the target DekProvider inside of the KekId and switch 
accordingly.
   
   #### How to use
   At one point I was planning on only having the DelegatingDekProvider and 
dynamically load DekProviders using a ServiceLoader. After @rdblue said these 
were slow, I'm planning on just hardcoding a map of the DekProviders we will 
use within our catalog implementation.
   
   @mccheah is this how the EncryptionManager is expected to be used or am I 
totally off base here?
   
   Also @jackye1995 we could hook in aws KMS here potentially? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to