shangxinli commented on pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-672148979
@gszadovszky Thanks for the correction of PARQUET-1784! Regarding the serialized/deserialized, it is not done. I was aware of that when I use ExtType. But that is something we will need to add later. Actually it is needed. The use case is that we need to translate the schema inside the Parquet files created by upstream like rawdata, to downstream HiveETL metastores like HMS. The linage of the crypto properties will be broken otherwise. Actually this is a reason we should add metadata(along with serialized/deserialized) instead of using Configuration. Creating helper functions helps but the problem is still that we need to add a long namespace all the way to the column level(nested). Sometimes one job needs to deal with more than one metastores. That requires adding a prefix to the namespace. So to locate a column, we need something like metastore.db.table.column_outlayers....column_innerlayers.crypto_key. This is not user friendly. Again, other schemas like Avro, Spark already have that, I think it would be better to alight with other schemas. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org