ggershinsky commented on pull request #1918:
URL: https://github.com/apache/iceberg/pull/1918#issuecomment-796797019


   Hi guys, now that parquet-mr-1.12 is about to be released, we're starting to 
explore integration of Parquet column encryption in Iceberg. Thanks for looping 
me in, this pull request is certainly related. From the discussion above, and 
from the PR code, it looks like the focus is on general file 
encryption/decryption streams, and not on leveraging the native column 
encryption capabilities in Parquet and ORC. Which is fine, as these offer 
complementary sets of capabilities. Also, I agree these mechanisms share many 
common areas that we should define and re-use as much as possible. I'm sure 
this is possible; e.g. I see references to Parquet key management design and 
concepts like single and double wrapping. There is a number of open questions, 
though, like the storage of key material and an approach to key rotation; 
definition of key material, key metadata, key provider etc; protection of 
column statistics with global or column-specific keys; etc. I think it would be 
good to have a d
 esign document that draws a top-down picture of data encryption in Iceberg, 
with the goals and staging/roadmap; how this PR fits it, how ORC and Parquet 
column encryption will fit in later, what should be the common layers, etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to