shangxinli commented on pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-669333155
The way it works is SchemaCryptoPropertiesFactory should have RPC call with KMS to get the column key for keyId/metadata defined in metastore, and build up encryption properties with the column key, keyId/metadata etc. We don't want to release helper function like setKey() etc because that will require code changes in the existing pipelines. Instead, we extend ParquetWriteSupport which converts crypto setting in schema(it could be Avro schema or other schemas) to Parquet schema's metadata. Then the class SchemaCryptoPropertiesFactory just consumes that metadata and builds the encryption properties that are required by Parquet-1178. By doing this, SchemaCryptoPropertiesFactory and extended ParquetWriteSupport are released as a library plugin and can be enabled at the cluster level. So the pipelines just need to change the setting for enabling it and set the classpath for it. In the test file SchemaCryptoPropertiesFactory(), I hardcoded the key/keymetadata because we don't have real KMS to use. CryptoGroupWriteSupport in the test is an example to extend WriteSupport to set metadata. In real usage, the metadata can be gotten from the schema, for example, avro schema for Hudi, or table proerpties in HMS etc. So it depends on what Parquet application it is. Here in the test I just hardcoded for explaining purpose only. As I mentioned earlier, I have the CryptoHoodieAvroWriteSupport her https://github.com/shangxinli/parquet-write-supports/blob/master/src/main/java/com/uber/hoodie/avro/CryptoHoodieAvroWriteSupport.java, but here I just used CryptoGroupWriteSupport as a demo. I know it is a little confusing because in Parquet code we don't' really let user set key or key metadata. Actually what we need from Parquet is just an extra field to transport the crypto settings without knowing what format/content of the settings is. Then everything else is implemented in the plugin(showing in test). I didn't address your other comments yet. Let me know if you are OK with it then I can start looking at those other comments. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
