jackye1995 edited a comment on pull request #2444:
URL: https://github.com/apache/iceberg/pull/2444#issuecomment-817024583


   > I was thinking one option would be along side the TableMetadata location 
within the external metastore for the active snapshot's TableMetadata, and then 
within the TableMetadata log for previous TableMetadatas
   
   @johnclara I was thinking something similar, and in fact the original API 
was:
   
   ```
   OutputFile encrypt(TableIdentifier tableId, OutputFile rawOutput);
   ```
   
   So that you can derive the key ID based on some information like the table 
identifier, which can be used in your use case. But the table operation does 
not have a good way to consistently get the table ID, catalogs such as 
`NessieCatalog` do not have this information. 
   
   So with the current encrypt API, what I think are 2 approaches to achieve 
one KEK per table use case:
   
   1. derive table ID based on `rawOutput.location()`, and as long as it 
follows the standard naming structure, we can get the namespace and table name 
and retrieve the KEK ID and generate a new DEK to encrypt a new metadata file.
   2. take `TableIdentifier` as an input of the 
`TableMetadataEncryptionManager` implementation, so that that implementation 
can reference this information in the encrypt method.
   
   The encrypted DEK then can be stored in 2 ways:
   1. in an external system, for example you can have a very simple DynamoDB 
table that stores the key value pair of (metadataLocation, encryptedDek). Then 
decrypt method can easily decrypt based on that mapping information plus table 
ID derived from any of the 2 method above.
   2. as what you suggested, adding it into the historical metadata log list. 
   
   However, I don't like the second approach personally due to the following 
reasons:
   1. this again places dependency on the table metadata. For each decryption, 
we need to call get table, check if it is the latest table metadata, and then 
choose to open table metadata or not to get the correct encrypted DEK. 
   2. for every write of new table metadata, it needs to get externally managed 
KEK + encrypted DEK and store it as a historical log entry, which would require 
a lot of hooks in different places to achieve this goal and satisfy both single 
and double wrap encryption.
   
   What I hope is to have a clean cut between what keys are managed by Iceberg 
and what keys are completely managed externally, so that 
`TableMetadataEncryptionManager` is completely independent of `TableMetadata`, 
and `EncryptionManager` can fully depend on `TableMetadata`. 
   
   If you have a good suggestion for storing encrypted DEK approach 2, please 
let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to