Hi Gang

Thank you for looking into this. Updating the description on parquet.apache.org 
will save everyone searching for this information a few hours of head 
scratching. It is unfortunate that the slightly out-of-date spec features more 
prominently in Google results.

Kind regards

Kaili
________________________________
From: Gang Wu <ust...@gmail.com>
Sent: Tuesday, January 9, 2024 5:56 PM
To: dev@parquet.apache.org <dev@parquet.apache.org>
Subject: Re: Discrepancy in parquet format documentation

Hi Kaili,

You're right. Please refer to the parquet-format repo for specs. The site
is unfortunately out of sync for a long time and there isn't any automatic
process to update it. Let me update the site manually to be in sync with
the latest format release.

Best,
Gang

On Sun, Jan 7, 2024 at 8:03 AM Kaili Zhang <kaili...@hotmail.com> wrote:

> Hi all
>
> I found this page via Google when searching for a description of the
> parquet binary format:
> https://parquet.apache.org/docs/file-format/data-pages/. This page
> suggests that definition levels are written before repetition levels.
>
> However, after experimenting with parquet files generated by pandas and
> pyarrow and perusing the arrow source code (especially
> InitializeLevelDecoders in
> https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc),
> I strongly believe that repetition levels are written before definition
> levels. I also found this other documentation of parquet format that has
> repetition levels before definition levels
> https://github.com/apache/parquet-format.
>
> The content of the parquet.apache.org/docs site appears to be tracked on
> Github under https://github.com/apache/parquet-site. Is the documentation
> content still being actively updated? Has there been an effort to
> synchronize the format descriptions under apache/parquet-site with those
> under apache/parquet-format?
>
> Kind regards
>
> Kaili
>
>

Reply via email to