Actually, adding a single line below this line <https://github.com/apache/parquet-mr/blob/fa7588c4c0f8d403e4815fa72e3b8a3bc98d73ec/parquet-tools/src/main/java/org/apache/parquet/tools/util/MetadataUtils.java#L152> should make parquet-meta print min/max statistics:

if (!meta.getStatistics().isEmpty()) out.format(" STA:[%s]", meta.getStatistics().toString());

Cheng

On 12/14/15 8:42 PM, Shushant Arora wrote:
Hi

Do you have any sample program in java to validate/read min max of column groups in Parquet file?

Thanks

On Tue, Dec 8, 2015 at 2:50 PM, Cheng Lian <[email protected] <mailto:[email protected]>> wrote:

    Cc'd Parquet dev list. At first I expected to discuss this issue
    on Parquet dev list but sent to the wrong mailing list. However, I
    think it's OK to discuss it here since lots of Spark users are
    using Parquet and this information should be generally useful here.

    Comments inlined.

    On 12/7/15 10:34 PM, Shushant Arora wrote:
    how to read it using parquet tools.
    When I did
    hadoop parquet.tools.Main meta prquetfilename

    I didn't get any info of min and max values.
    Didn't realize that you meant to inspect min/max values since what
    you asked was how to inspect the version of Parquet library that
    is used to generate the Parquet file.

    Currently parquet-tools doesn't print min/max statistics
    information. I'm afraid you'll have to do it programmatically.
    How can I see parquet version of my file.Is min max respective to
    some parquet version or available since beginning?
    AFAIK, it was added in 1.5.0
    
https://github.com/apache/parquet-mr/blob/parquet-1.5.0/parquet-column/src/main/java/parquet/column/statistics/Statistics.java

    But I failed to find corresponding JIRA ticket or pull request for
    this.



    On Mon, Dec 7, 2015 at 6:51 PM, Singh, Abhijeet
    <[email protected] <mailto:[email protected]>> wrote:

        Yes, Parquet has min/max.

        *From:*Cheng Lian [mailto:[email protected]
        <mailto:[email protected]>]
        *Sent:* Monday, December 07, 2015 11:21 AM
        *To:* Ted Yu
        *Cc:* Shushant Arora; [email protected]
        <mailto:[email protected]>
        *Subject:* Re: parquet file doubts

        Oh sorry... At first I meant to cc spark-user list since
        Shushant and I had been discussed some Spark related issues
        before. Then I realized that this is a pure Parquet issue,
        but forgot to change the cc list. Thanks for pointing this
        out! Please ignore this thread.

        Cheng

        On 12/7/15 12:43 PM, Ted Yu wrote:

            Cheng:

            I only see user@spark in the CC.

            FYI

            On Sun, Dec 6, 2015 at 8:01 PM, Cheng Lian
            <[email protected] <mailto:[email protected]>> wrote:

            cc parquet-dev list (it would be nice to always do so for
            these general questions.)

            Cheng

            On 12/6/15 3:10 PM, Shushant Arora wrote:

            Hi

            I have few doubts on parquet file format.

            1.Does parquet keeps min max statistics like in ORC. how
            can I see parquet version(whether its1.1,1.2or1.3) for
            parquet file generated using hive or custom MR or
            AvroParquetoutputFormat.

            Yes, Parquet also keeps row group statistics. You may
            check the Parquet file using the parquet-meta CLI tool in
            parquet-tools (see
            https://github.com/Parquet/parquet-mr/issues/321 for
            details), then look for the "creator" field of the file.
            For programmatic access, check for
            o.a.p.hadoop.metadata.FileMetaData.createdBy.


            2.how to sort parquet records while generating parquet
            file using avroparquetoutput format?

            AvroParquetOutputFormat is not a format. It's just
            responsible for converting Avro records to Parquet
            records. How are you using AvroParquetOutputFormat? Any
            example snippets?


            Thanks



            
---------------------------------------------------------------------
            To unsubscribe, e-mail:
            
<mailto:[email protected]>[email protected]
            <mailto:[email protected]>
            For additional commands, e-mail:
            <mailto:[email protected]>[email protected]
            <mailto:[email protected]>





Reply via email to