Thanks Cheng! Here is a useful blog post: http://grepalex.com/2014/05/13/parquet-file-format-and-object-model/ about 2.
On Sun, Dec 6, 2015 at 9:52 PM, Cheng Lian <[email protected]> wrote: > cc parquet-dev list (it would be nice to always do so for these general > questions.) > > Cheng > > On 12/6/15 3:10 PM, Shushant Arora wrote: > >> Hi >> >> I have few doubts on parquet file format. >> >> 1.Does parquet keeps min max statistics like in ORC. how can I see >> parquet version(whether its1.1,1.2or1.3) for parquet file generated >> using hive or custom MR or AvroParquetoutputFormat. >> > > Yes, Parquet also keeps row group statistics. You may check the Parquet > file using the parquet-meta CLI tool in parquet-tools (see > https://github.com/Parquet/parquet-mr/issues/321 for details), then look > for the "creator" field of the file. For programmatic access, check for > o.a.p.hadoop.metadata.FileMetaData.createdBy. > > >> 2.how to sort parquet records while generating parquet file using >> avroparquetoutput format? >> > > AvroParquetOutputFormat is not a format. It's just responsible for > converting Avro records to Parquet records. How are you using > AvroParquetOutputFormat? Any example snippets? > > >> Thanks >> > > -- Julien
