Yes, Parquet has min/max.

From: Cheng Lian [mailto:l...@databricks.com]
Sent: Monday, December 07, 2015 11:21 AM
To: Ted Yu
Cc: Shushant Arora; user@spark.apache.org
Subject: Re: parquet file doubts

Oh sorry... At first I meant to cc spark-user list since Shushant and I had 
been discussed some Spark related issues before. Then I realized that this is a 
pure Parquet issue, but forgot to change the cc list. Thanks for pointing this 
out! Please ignore this thread.

Cheng
On 12/7/15 12:43 PM, Ted Yu wrote:
Cheng:
I only see user@spark in the CC.

FYI

On Sun, Dec 6, 2015 at 8:01 PM, Cheng Lian 
<l...@databricks.com<mailto:l...@databricks.com>> wrote:
cc parquet-dev list (it would be nice to always do so for these general 
questions.)

Cheng

On 12/6/15 3:10 PM, Shushant Arora wrote:
Hi

I have few doubts on parquet file format.

1.Does parquet keeps min max statistics like in ORC. how can I see parquet 
version(whether its1.1,1.2or1.3) for parquet file generated using hive or 
custom MR or AvroParquetoutputFormat.
Yes, Parquet also keeps row group statistics. You may check the Parquet file 
using the parquet-meta CLI tool in parquet-tools (see 
https://github.com/Parquet/parquet-mr/issues/321 for details), then look for 
the "creator" field of the file. For programmatic access, check for 
o.a.p.hadoop.metadata.FileMetaData.createdBy.

2.how to sort parquet records while generating parquet file using 
avroparquetoutput format?
AvroParquetOutputFormat is not a format. It's just responsible for converting 
Avro records to Parquet records. How are you using AvroParquetOutputFormat? Any 
example snippets?

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>


Reply via email to