shangxinli commented on pull request #808:
URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-672148979


   @gszadovszky Thanks for the correction of PARQUET-1784!  Regarding the 
serialized/deserialized,  it is not done. I was aware of that when I use 
ExtType.  But that is something we will need to add later. Actually it is 
needed. The use case is that we need to translate the schema inside the Parquet 
files created by upstream like rawdata, to downstream HiveETL metastores like 
HMS.  The linage of the crypto properties will be broken otherwise. Actually 
this is a reason we should add metadata(along with serialized/deserialized) 
instead of using Configuration. 
   
   Creating helper functions helps but the problem is still that we need to add 
a long namespace all the way to the column level(nested). Sometimes one job 
needs to deal with more than one metastores. That requires adding a prefix to 
the namespace. So to locate a column, we need something like 
metastore.db.table.column_outlayers....column_innerlayers.crypto_key. This is 
not user friendly. Again, other schemas like Avro, Spark already have that, I 
think it would be better to alight with other schemas. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to