jackye1995 opened a new pull request #2443: URL: https://github.com/apache/iceberg/pull/2443
This is the initial PR based on design doc https://docs.google.com/document/d/1kkcjr9KrlB9QagRX3ToulG_Rf-65NMSlVANheDNzJq4/edit# It introduces envelope encryption to `TableMeatadata` as a part of Iceberg managed spec. Because there is ongoing contribution of row key to the spec, I am placing only the encryption spec itself in this PR and will use another PR for all parsers and add it to table metadata, otherwise it's going to be hard for me to merge conflicts. The spec looks something like: ``` "encryption": { "manifest-list-config": { "kek-id": "kekId1", "algorithm": "AES_GCM", "properties": { "aad.writer": "process1" } }, "manifest-file-config": { "kek-id": "kekId2", "algorithm": "AES_CTR" }, "data-file-config": { "mek-id": "mekId", "algorithm": "AES_GCM_CTR", "properties": { "aad.writer": "process1" }, "column-metadata": [ { "mek-id": "mekId", "kek-id": "kekId", "algorithm": "AES_GCM_CTR", "properties": { "mask.type": "null" }, "column-ids": [ 1, 2 ] } ] } } ``` It allows the encryption of different Iceberg file types with different encryption keys. Consequently, 2 new `encrypt` APIs are added to `EncryptionManager` to deal with those different file types. To support envelope key encryption, it also introduces a `KeyProvider` that has the bare minimum API needed to communicate with KMS to get and decrypt keys. A higher level interface `EnvelopeKeyManager` is used to perform caching and other potential key management optimizations. The actual implementations of these interfaces will be added in separated PRs. A concept called encryption pushdown is introduced, and when enabled, instead of decrypting/encrypting the entire file using the encryption manager's input/output stream, it passes the stream down with all the encryption metadata, so that specific file format can pick up the stream to perform native encryption. To minimize the size of this PR for readability, currently the encrypting and decrypting stream will just pass through like plaintext stream, I will place concrete implementation as a separated PR. @rdblue @ggershinsky @shangxinli @andersonm-ibm @yyanyy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
