shangxinli commented on pull request #808:
URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-669333155


   The way it works is SchemaCryptoPropertiesFactory should have RPC call with 
KMS to get the column key for keyId/metadata defined in metastore, and build up 
encryption properties with the column key, keyId/metadata etc. We don't want to 
release helper function like setKey() etc because that will require code 
changes in the existing pipelines.  Instead, we extend ParquetWriteSupport 
which converts crypto setting in schema(it could be Avro schema or other 
schemas) to Parquet schema's metadata. Then the class 
SchemaCryptoPropertiesFactory just consumes that metadata and builds the 
encryption properties that are required by Parquet-1178. By doing this, 
SchemaCryptoPropertiesFactory and extended ParquetWriteSupport are released as 
a library plugin and can be enabled at the cluster level. So the pipelines just 
need to change the setting for enabling it and set the classpath for it. 
   
   In the test file SchemaCryptoPropertiesFactory(), I hardcoded the 
key/keymetadata because we don't have real KMS to use.  CryptoGroupWriteSupport 
in the test is an example to extend WriteSupport to set metadata. In real 
usage, the metadata can be gotten from the schema, for example, avro schema for 
Hudi, or table proerpties in HMS etc. So it depends on what Parquet application 
it is.  Here in the test I just hardcoded for explaining purpose only. As I 
mentioned earlier, I have the CryptoHoodieAvroWriteSupport her 
https://github.com/shangxinli/parquet-write-supports/blob/master/src/main/java/com/uber/hoodie/avro/CryptoHoodieAvroWriteSupport.java,
 but here I just used CryptoGroupWriteSupport as a demo. 
   
   I know it is a little confusing because in Parquet code we don't' really let 
user set key or key metadata. Actually what we need from Parquet is just an 
extra field to transport the crypto settings without knowing what 
format/content of the settings is. Then everything else is implemented in the 
plugin(showing in test). 
   
   I didn't address your other comments yet. Let me know if you are OK with it 
then I can start looking at those other comments. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to