Hi

Do you have any sample program in java to validate/read min max of column
groups in Parquet file?

Thanks

On Tue, Dec 8, 2015 at 2:50 PM, Cheng Lian <[email protected]> wrote:

> Cc'd Parquet dev list. At first I expected to discuss this issue on
> Parquet dev list but sent to the wrong mailing list. However, I think it's
> OK to discuss it here since lots of Spark users are using Parquet and this
> information should be generally useful here.
>
> Comments inlined.
>
> On 12/7/15 10:34 PM, Shushant Arora wrote:
>
> how to read it using parquet tools.
> When I did
> hadoop parquet.tools.Main meta prquetfilename
>
> I didn't get any info of min and max values.
>
> Didn't realize that you meant to inspect min/max values since what you
> asked was how to inspect the version of Parquet library that is used to
> generate the Parquet file.
>
> Currently parquet-tools doesn't print min/max statistics information. I'm
> afraid you'll have to do it programmatically.
>
> How can I see parquet version of my file.Is min max respective to some
> parquet version or available since beginning?
>
> AFAIK, it was added in 1.5.0
> https://github.com/apache/parquet-mr/blob/parquet-1.5.0/parquet-column/src/main/java/parquet/column/statistics/Statistics.java
>
> But I failed to find corresponding JIRA ticket or pull request for this.
>
>
>
> On Mon, Dec 7, 2015 at 6:51 PM, Singh, Abhijeet <[email protected]>
> wrote:
>
>> Yes, Parquet has min/max.
>>
>>
>>
>> *From:* Cheng Lian [mailto:[email protected]]
>> *Sent:* Monday, December 07, 2015 11:21 AM
>> *To:* Ted Yu
>> *Cc:* Shushant Arora; <[email protected]>[email protected]
>> *Subject:* Re: parquet file doubts
>>
>>
>>
>> Oh sorry... At first I meant to cc spark-user list since Shushant and I
>> had been discussed some Spark related issues before. Then I realized that
>> this is a pure Parquet issue, but forgot to change the cc list. Thanks for
>> pointing this out! Please ignore this thread.
>>
>> Cheng
>>
>> On 12/7/15 12:43 PM, Ted Yu wrote:
>>
>> Cheng:
>>
>> I only see user@spark in the CC.
>>
>>
>>
>> FYI
>>
>>
>>
>> On Sun, Dec 6, 2015 at 8:01 PM, Cheng Lian < <[email protected]>
>> [email protected]> wrote:
>>
>> cc parquet-dev list (it would be nice to always do so for these general
>> questions.)
>>
>> Cheng
>>
>> On 12/6/15 3:10 PM, Shushant Arora wrote:
>>
>> Hi
>>
>> I have few doubts on parquet file format.
>>
>> 1.Does parquet keeps min max statistics like in ORC. how can I see
>> parquet version(whether its1.1,1.2or1.3) for parquet file generated using
>> hive or custom MR or AvroParquetoutputFormat.
>>
>> Yes, Parquet also keeps row group statistics. You may check the Parquet
>> file using the parquet-meta CLI tool in parquet-tools (see
>> https://github.com/Parquet/parquet-mr/issues/321 for details), then look
>> for the "creator" field of the file. For programmatic access, check for
>> o.a.p.hadoop.metadata.FileMetaData.createdBy.
>>
>>
>> 2.how to sort parquet records while generating parquet file using
>> avroparquetoutput format?
>>
>> AvroParquetOutputFormat is not a format. It's just responsible for
>> converting Avro records to Parquet records. How are you using
>> AvroParquetOutputFormat? Any example snippets?
>>
>>
>> Thanks
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: <[email protected]>
>> [email protected]
>> For additional commands, e-mail: <[email protected]>
>> [email protected]
>>
>>
>>
>>
>>
>
>
>

Reply via email to