omalley edited a comment on issue #20: Encryption in Data Files
URL: 
https://github.com/apache/incubator-iceberg/issues/20#issuecomment-443363218
 
 
   I understand that column encryption take file format support and that isn't 
available yet, although it will be available for ORC soon.
   
   I haven't looked at the details of Palantir's hadoop-crypto library, but the 
approach looks good. For per-file encryption, I would:
   * Define the key name in the Iceberg table metadata.
   * When writing, call the KMS to generate a random local key and the 
corresponding encrypted bytes. Create a random IV for each file. When you 
update the manifest for the file, add the key name, key version, encryption 
algorithm, iv, and encrypted local key.
   * When reading, use the metadata from the manifest to have the KMS decrypt 
the key for that file. Use the decrypted key and iv to decrypt the file as 
needed.
   
   The relevant features:
   * The master key stays in the KMS and is never given to the user.
   * There is only one trip per a file to the KMS during reading or writing.
   * The encryption never reuses a local key/iv pair. Reuse of those pairs is 
very very bad.
   * If the user keeps a local key, it can only be used to decrypt that file.
   * Rolling new versions of master keys is supported.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to