Hi everyone,

In parquet.thrift the definition of struct ColumnMetaData

   1.

   The field "path_in_schema" is a string list, should not there be only
   one path in the schema for a specified column? And in parquet-hadoop the
   corresponding class "ColumnChunkMetaData" there is the field "ColumnPath
   path", which is not a list.
   2.

   The field "codec" which represents the compression codec of the column,
   why is it not a list? Must all pages in the same column use the same
   compression codec?

Can anyone explain this?

Below is the definition snippet of ColumnMetaData in parquet.thrift.

struct ColumnMetaData {
  ...
  3: required list<string> path_in_schema

  4: required CompressionCodec codec
  ...
}

Thanks & Best Regards

——————————

Tenghuan He

Reply via email to