Hi Do you have any sample program in java to validate/read min max of column groups in Parquet file?
Thanks On Tue, Dec 8, 2015 at 2:50 PM, Cheng Lian <[email protected]> wrote: > Cc'd Parquet dev list. At first I expected to discuss this issue on > Parquet dev list but sent to the wrong mailing list. However, I think it's > OK to discuss it here since lots of Spark users are using Parquet and this > information should be generally useful here. > > Comments inlined. > > On 12/7/15 10:34 PM, Shushant Arora wrote: > > how to read it using parquet tools. > When I did > hadoop parquet.tools.Main meta prquetfilename > > I didn't get any info of min and max values. > > Didn't realize that you meant to inspect min/max values since what you > asked was how to inspect the version of Parquet library that is used to > generate the Parquet file. > > Currently parquet-tools doesn't print min/max statistics information. I'm > afraid you'll have to do it programmatically. > > How can I see parquet version of my file.Is min max respective to some > parquet version or available since beginning? > > AFAIK, it was added in 1.5.0 > https://github.com/apache/parquet-mr/blob/parquet-1.5.0/parquet-column/src/main/java/parquet/column/statistics/Statistics.java > > But I failed to find corresponding JIRA ticket or pull request for this. > > > > On Mon, Dec 7, 2015 at 6:51 PM, Singh, Abhijeet <[email protected]> > wrote: > >> Yes, Parquet has min/max. >> >> >> >> *From:* Cheng Lian [mailto:[email protected]] >> *Sent:* Monday, December 07, 2015 11:21 AM >> *To:* Ted Yu >> *Cc:* Shushant Arora; <[email protected]>[email protected] >> *Subject:* Re: parquet file doubts >> >> >> >> Oh sorry... At first I meant to cc spark-user list since Shushant and I >> had been discussed some Spark related issues before. Then I realized that >> this is a pure Parquet issue, but forgot to change the cc list. Thanks for >> pointing this out! Please ignore this thread. >> >> Cheng >> >> On 12/7/15 12:43 PM, Ted Yu wrote: >> >> Cheng: >> >> I only see user@spark in the CC. >> >> >> >> FYI >> >> >> >> On Sun, Dec 6, 2015 at 8:01 PM, Cheng Lian < <[email protected]> >> [email protected]> wrote: >> >> cc parquet-dev list (it would be nice to always do so for these general >> questions.) >> >> Cheng >> >> On 12/6/15 3:10 PM, Shushant Arora wrote: >> >> Hi >> >> I have few doubts on parquet file format. >> >> 1.Does parquet keeps min max statistics like in ORC. how can I see >> parquet version(whether its1.1,1.2or1.3) for parquet file generated using >> hive or custom MR or AvroParquetoutputFormat. >> >> Yes, Parquet also keeps row group statistics. You may check the Parquet >> file using the parquet-meta CLI tool in parquet-tools (see >> https://github.com/Parquet/parquet-mr/issues/321 for details), then look >> for the "creator" field of the file. For programmatic access, check for >> o.a.p.hadoop.metadata.FileMetaData.createdBy. >> >> >> 2.how to sort parquet records while generating parquet file using >> avroparquetoutput format? >> >> AvroParquetOutputFormat is not a format. It's just responsible for >> converting Avro records to Parquet records. How are you using >> AvroParquetOutputFormat? Any example snippets? >> >> >> Thanks >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: <[email protected]> >> [email protected] >> For additional commands, e-mail: <[email protected]> >> [email protected] >> >> >> >> >> > > >
