Yes, as Gang mentioned, I was indeed referring to the data page version which is stored in the data page header. Looks like we will need to modify the cli to print them out.
Thanks, Micah Kornfield and Gang Wu for the info! Thanks, Simhadri G On Sun, Apr 23, 2023 at 7:18 AM Gang Wu <[email protected]> wrote: > CMIW, the writer version here means the data page version [1], which is > stored in the data page header [2] and differs from format version [3]. > > The format version can be obtained directly via show metadata data command > suggested by Micah. > > Although there is a command line in the parquet-mr to print page metadata > [4], unfortunately it doesn't print the data page version. The cli may need > extra work to print them out. > > [1] > > https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md?plain=1#L130 > [2] > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L668 > [3] https://github.com/apache/parquet-format/blob/master/CHANGES.md > [4] > > https://github.com/apache/parquet-mr/blob/master/parquet-cli/README.md?plain=1#L84 > > Best, > Gang > > On Sun, Apr 23, 2023 at 5:02 AM Micah Kornfield <[email protected]> > wrote: > > > I'm not familiar with it but I would think the show metadata data command > > would work get general metadata. Please note the version field is not > > entirely helpful as some implementations always hard-code it to certain > > value. The application/created by is generally better way to determine > the > > writer. > > > > Another way of doing this is with pyarrow [1] > > > > [1] > > > > > https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_metadata.html > > > > On Thu, Apr 20, 2023 at 6:42 AM Simhadri G <[email protected]> > wrote: > > > > > Hi everyone, > > > > > > I have a question regarding the WRITER_VERSION = > > “parquet.writer.version”. > > > > > > I understand that the writer can have one of the 2 values can have the > > > following 2 values. [1] > > > > > > PARQUET_1_0 ("v1"), > > > PARQUET_2_0 ("v2"); > > > > > > I currently have a parquet file and I would like to determine the > parquet > > > writer version used to write this file. I have tried to obtain the > > > metadata/dump using parquet-tools, but unfortunately, this did not > > include > > > the information I needed. > > > > > > Therefore, I would be most grateful if someone could please help me out > > by > > > advising where I can find the writer version information. Thank you > very > > > much for your time and assistance. > > > > > > Thanks, > > > Simhadri G > > > > > > [1] > > > > > > > > > https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java#L69 > > > > > >
