gszadovszky commented on pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-671925400
@shangxinli, The column-wise configuration you are talking about (PARQUET-1784: Column-wise configuration (#754)) is only a specified key format and the related helper implementations for the Hadoop conf. We might have used this format to specify the encryption properties but I'm afraid it is do late to do that and I am even unsure if it would make sense to have a completely different approach for setting such properties than what the other components in the Hadoop era use. I tend to agree with @ggershinsky. The way you want to extend the parquet schema is a general extension to add any metadata for any schema elements. However, I cannot see any more purpose but what you have described. Moreover, this way you are only extending the schema objects that are used only inside parquet-mr. This metadata won't be written to the parquet files nor serialized/deserialized to/from the metastore as is. Anything you want to be in this metadata have to be implemented either inside parquet-mr or in the plugins. What you have described is good in adding the encryption properties to the schema is that it is easier and less error prone to define the properties just next to the schema elements (columns). But you can also write helper methods which can write the proper key/values to the hadoop conf or the extra metadata. These helpers can be unit tested to ensure they are working correctly. This way the implementation of the ParquetWriteSupport can be compact and type/value checked. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
