jackye1995 opened a new pull request #2443:
URL: https://github.com/apache/iceberg/pull/2443


   This is the initial PR based on design doc 
https://docs.google.com/document/d/1kkcjr9KrlB9QagRX3ToulG_Rf-65NMSlVANheDNzJq4/edit#
   
   It introduces envelope encryption to `TableMeatadata` as a part of Iceberg 
managed spec. Because there is ongoing contribution of row key to the spec, I 
am placing only the encryption spec itself in this PR and will use another PR 
for all parsers and add it to table metadata, otherwise it's going to be hard 
for me to merge conflicts. The spec looks something like:
   
   ```
   "encryption": {
       "manifest-list-config": {
         "kek-id": "kekId1",
         "algorithm": "AES_GCM",
         "properties": {
           "aad.writer": "process1"
         }
       },
       "manifest-file-config": {
         "kek-id": "kekId2",
         "algorithm": "AES_CTR"
       },
       "data-file-config": {
         "mek-id": "mekId",
         "algorithm": "AES_GCM_CTR",
         "properties": {
           "aad.writer": "process1"
         },
         "column-metadata": [
           {
             "mek-id": "mekId",
             "kek-id": "kekId",
             "algorithm": "AES_GCM_CTR",
             "properties": {
               "mask.type": "null"
             },
             "column-ids": [
               1,
               2
             ]
           }
         ]
       }
     }
   ```
   
   It allows the encryption of different Iceberg file types with different 
encryption keys. Consequently, 2 new `encrypt` APIs are added to 
`EncryptionManager` to deal with those different file types. 
   
   To support envelope key encryption, it also introduces a `KeyProvider` that 
has the bare minimum API needed to communicate with KMS to get and decrypt 
keys.  A higher level interface `EnvelopeKeyManager` is used to perform caching 
and other potential key management optimizations. The actual implementations of 
these interfaces will be added in separated PRs.
   
   A concept called encryption pushdown is introduced, and when enabled, 
instead of decrypting/encrypting the entire file using the encryption manager's 
input/output stream, it passes the stream down with all the encryption 
metadata, so that specific file format can pick up the stream to perform native 
encryption.
   
   To minimize the size of this PR for readability, currently the encrypting 
and decrypting stream will just pass through like plaintext stream, I will 
place concrete implementation as a separated PR.
   
   @rdblue @ggershinsky @shangxinli @andersonm-ibm @yyanyy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to