Thanks Julien and Nong. Looks like parquet supports both column (re: Nong) and 
file (re: Julien) level metadata.right?
Regards,Mohammad
 

    On Thursday, June 30, 2016 11:37 AM, Julien Le Dem <[email protected]> wrote:
 

 You can store arbitrary key values alongside the schema in the footer:
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L565
 
<https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L565>
struct FileMetaData {
  /** Version of this file **/
  1: required i32 version

  /** Parquet schema for this file.  This schema contains metadata for all the 
columns.
  * The schema is represented as a tree with a single root.  The nodes of the 
tree
  * are flattened to a list by doing a depth-first traversal.
  * The column metadata contains the path in the schema for that column which 
can be
  * used to map columns to nodes in the schema.
  * The first element is the root **/
  2: required list<SchemaElement> schema;

  /** Number of rows in this file **/
  3: required i64 num_rows

  /** Row groups in this file **/
  4: required list<RowGroup> row_groups

  /** Optional key/value metadata **/
  5: optional list<KeyValue> key_value_metadata

  /** String for application that wrote this file.  This should be in the format
  * <Application> version <App Version> (build <App Build Hash>).
  * e.g. impala version 1.0 (build 6cf94d29b2b7115df4de2c06e2ab4326d721eb55)
  **/
  6: optional string created_by
}

You could make the key something like "{some unique name prefix specific to 
you}.PII.columns”=a.b.c,d.e.f


> On Jun 30, 2016, at 10:44 AM, Mohammad Islam <[email protected]> 
> wrote:
> 
> Hi All,
> What is the best way of tagging any field schema with metadata? Does Parquet 
> support it? I think Avro has "doc" attribute. Also Hive schema has "comments".
> I need to tag each field whether it is PII or not. I think someone may want 
> to add description of a field as well.
> Regards,Mohammad
> 


  

Reply via email to