Oh sorry... At first I meant to cc spark-user list since Shushant and I
had been discussed some Spark related issues before. Then I realized
that this is a pure Parquet issue, but forgot to change the cc list.
Thanks for pointing this out! Please ignore this thread.
Cheng
On 12/7/15 12:43 PM, Ted Yu wrote:
Cheng:
I only see user@spark in the CC.
FYI
On Sun, Dec 6, 2015 at 8:01 PM, Cheng Lian <l...@databricks.com
<mailto:l...@databricks.com>> wrote:
cc parquet-dev list (it would be nice to always do so for these
general questions.)
Cheng
On 12/6/15 3:10 PM, Shushant Arora wrote:
Hi
I have few doubts on parquet file format.
1.Does parquet keeps min max statistics like in ORC. how can I
see parquet version(whether its1.1,1.2or1.3) for parquet file
generated using hive or custom MR or AvroParquetoutputFormat.
Yes, Parquet also keeps row group statistics. You may check the
Parquet file using the parquet-meta CLI tool in parquet-tools (see
https://github.com/Parquet/parquet-mr/issues/321 for details),
then look for the "creator" field of the file. For programmatic
access, check for o.a.p.hadoop.metadata.FileMetaData.createdBy.
2.how to sort parquet records while generating parquet file
using avroparquetoutput format?
AvroParquetOutputFormat is not a format. It's just responsible for
converting Avro records to Parquet records. How are you using
AvroParquetOutputFormat? Any example snippets?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>