CMIW, the writer version here means the data page version [1], which is
stored in the data page header [2] and differs from format version [3].

The format version can be obtained directly via show metadata data command
suggested by Micah.

Although there is a command line in the parquet-mr to print page metadata
[4], unfortunately it doesn't print the data page version. The cli may need
extra work to print them out.

[1]
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md?plain=1#L130
[2]
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L668
[3] https://github.com/apache/parquet-format/blob/master/CHANGES.md
[4]
https://github.com/apache/parquet-mr/blob/master/parquet-cli/README.md?plain=1#L84

Best,
Gang

On Sun, Apr 23, 2023 at 5:02 AM Micah Kornfield <[email protected]>
wrote:

> I'm not familiar with it but I would think the show metadata data command
> would work get general metadata.  Please note the version field is not
> entirely helpful as some implementations always hard-code it to certain
> value.  The application/created by is generally better way to determine the
> writer.
>
> Another way of doing this is with pyarrow [1]
>
> [1]
>
> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_metadata.html
>
> On Thu, Apr 20, 2023 at 6:42 AM Simhadri G <[email protected]> wrote:
>
> > Hi everyone,
> >
> > I have a question regarding the WRITER_VERSION =
> “parquet.writer.version”.
> >
> > I understand that the writer can have one of the 2 values can have the
> > following 2 values. [1]
> >
> > PARQUET_1_0 ("v1"),
> > PARQUET_2_0 ("v2");
> >
> > I currently have a parquet file and I would like to determine the parquet
> > writer version used to write this file. I have tried to obtain the
> > metadata/dump using parquet-tools, but unfortunately, this did not
> include
> > the information I needed.
> >
> > Therefore, I would be most grateful if someone could please help me out
> by
> > advising where I can find the writer version information. Thank you very
> > much for your time and assistance.
> >
> > Thanks,
> > Simhadri G
> >
> > [1]
> >
> >
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L69
> >
>

Reply via email to