johnclara opened a new pull request #1918: URL: https://github.com/apache/iceberg/pull/1918
### Goal: Share maintenance of logic in the community for doing symmetric key encryption. Eventually this can be extended with specific dek providers (eg: aws specific implementations for aws KMS in iceberg-aws) and CryptoFormats (eg AES GCM/CTR similar to the s3 encrypted client from aws java sdk 1). ### High level idea: A Table object will be configured with a KekId for encrypting new data files. On write: The table will use this KekId to reach out to a DekProvider which supports the KekId and retrieve a Dek with an encrypted and plaintext version. After using the plaintext dek version to encrypt the data file, it will save the encrypted version and the KekId used to generate the dek inside of EncryptionKeyMetadata. On read: The table will load the kekId and encrypted dek from the EncryptionKeyMetadata and reach out to the DekProvider to retrieve the plaintext version. Then it will use the plaintext version to unencrypt the file. ### This Implementation #### Loading and Dumping The table already needs to be initialized with a KekId, it needs to be loadable from a Map<String, String> of properties (either in the table properties or catalog properties). For this reason, I'm also storing the KekId in the EncryptionKeyMetadata as a Map<String, String> and relying on each DekProvider to load it out. What do you all think about this? Is this a terrible idea? For legacy reasons on my team's side, the Dek is also loaded in the same way but this should probably be split up. #### Dek Providers **KmsDekProvider:** This will retrieve a dek from an external key store. This is written with AWS KMS in mind (This PR was motivated by trying to switch to aws java sdk2 in order to use the iceberg-aws libraries.) This implementation should also work for GCP KMS. **PlaintextDekProvider:** This won't actually encrypt the dek when it is stored inside the encryptionKeyMetadata. This is for users who encrypt manifests. **DelegatingDekProvider:** Some Catalogs will support multiple types of dek providers and some tables might transition from one dek provider to another. The DelegatingDekProvider will have a field for the target DekProvider inside of the KekId and switch accordingly. #### How to use At one point I was planning on only having the DelegatingDekProvider and dynamically load DekProviders using a ServiceLoader. After @rdblue said these were slow, I'm planning on just hardcoding a map of the DekProviders we will use within our catalog implementation. @mccheah is this how the EncryptionManager is expected to be used or am I totally off base here? Also @jackye1995 we could hook in aws KMS here potentially? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
