Hi Gang - We could embed the README.md on parquet-format as an iframe on
the docsy website (something better than just a url link). It could also be
easy just to link. The other option, which seems to be what iceberg does  (
https://iceberg.apache.org/docs/latest/ or
https://iceberg.apache.org/docs/1.4.3/) is to actually version the entire
set of docs and tie it to a version of parquet-mr or parquet-format. I've
mostly treated releases as blog posts for now
https://parquet.apache.org/blog/, but if that's not the best way to handle
versioned docs, we can explore adopting Iceberg's model.
<vinoo.gan...@gmail.com>


On Mon, Mar 4, 2024 at 8:50 PM Gang Wu <ust...@gmail.com> wrote:

> Hi Vinoo,
>
> Thanks for the reply! How do you want to embed the README?
> Linking it to the parquet-format repo or just copying the whole content?
> IMO we might need to make it clear to the users that they know what
> version of the format they are looking at. Therefore linking to the
> format repo (and maybe add different versions as well) sounds much
> better to me.
>
> Best,
> Gang
>
> On Tue, Mar 5, 2024 at 3:18 AM Vinoo Ganesh <vinoo.gan...@gmail.com>
> wrote:
>
> > Hi All - Sorry I missed this email chain. I've been mostly responsible
> > for building the infrastructure around the new parquet-site website, but
> > have mostly left the existing content alone. I'm happy to just link to
> the
> > parquet-format repo, but that would mean the content is no longer
> > searchable from the website, and users would have to first find the link
> to
> > the parquet-format repo from the docs and then navigate there.
> >
> > I could just embed the parquet-format README in an iframe on the spec
> docs.
> > Alternatively, as part of the release actions, we can add a task that
> opens
> > an issue on parquet-site for update.
> >
> > Do people have thoughts / opinions on these two?
> >
> > On Thu, Jan 18, 2024 at 1:33 PM Kaili Zhang <kaili...@hotmail.com>
> wrote:
> >
> > > Hi Gabor
> > >
> > > I am OK with that. As long as the information is up-to-date, whatever
> > > method most convenient for the devs will do.
> > >
> > > Kind regards
> > >
> > > Kaili
> > >
> > > ________________________________
> > > From: Gábor Szádovszky <ga...@apache.org>
> > > Sent: Monday, January 15, 2024 12:25:39 AM
> > > To: dev@parquet.apache.org <dev@parquet.apache.org>
> > > Subject: Re: Discrepancy in parquet format documentation
> > >
> > > Hey Gang, Kaili,
> > >
> > > I think the easiest way to solve this issue is to completely remove the
> > > spec from the site and add a reference to the parquet-format repo
> > instead.
> > > We should probably add the release tag links when we make a release of
> > > parquet-format with a "latest" link. This way we would also avoid
> > potential
> > > issues when someone would make decisions based on un-released spec
> > changes.
> > >
> > > Cheers,
> > > Gabor
> > >
> > > Kaili Zhang <kaili...@hotmail.com> ezt írta (időpont: 2024. jan. 13.,
> > Szo,
> > > 20:53):
> > >
> > > > Hi Gang
> > > >
> > > > Thank you for looking into this. Updating the description on
> > > > parquet.apache.org will save everyone searching for this
> information a
> > > > few hours of head scratching. It is unfortunate that the slightly
> > > > out-of-date spec features more prominently in Google results.
> > > >
> > > > Kind regards
> > > >
> > > > Kaili
> > > > ________________________________
> > > > From: Gang Wu <ust...@gmail.com>
> > > > Sent: Tuesday, January 9, 2024 5:56 PM
> > > > To: dev@parquet.apache.org <dev@parquet.apache.org>
> > > > Subject: Re: Discrepancy in parquet format documentation
> > > >
> > > > Hi Kaili,
> > > >
> > > > You're right. Please refer to the parquet-format repo for specs. The
> > site
> > > > is unfortunately out of sync for a long time and there isn't any
> > > automatic
> > > > process to update it. Let me update the site manually to be in sync
> > with
> > > > the latest format release.
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Sun, Jan 7, 2024 at 8:03 AM Kaili Zhang <kaili...@hotmail.com>
> > wrote:
> > > >
> > > > > Hi all
> > > > >
> > > > > I found this page via Google when searching for a description of
> the
> > > > > parquet binary format:
> > > > > https://parquet.apache.org/docs/file-format/data-pages/. This page
> > > > > suggests that definition levels are written before repetition
> levels.
> > > > >
> > > > > However, after experimenting with parquet files generated by pandas
> > and
> > > > > pyarrow and perusing the arrow source code (especially
> > > > > InitializeLevelDecoders in
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc
> > > > ),
> > > > > I strongly believe that repetition levels are written before
> > definition
> > > > > levels. I also found this other documentation of parquet format
> that
> > > has
> > > > > repetition levels before definition levels
> > > > > https://github.com/apache/parquet-format.
> > > > >
> > > > > The content of the parquet.apache.org/docs site appears to be
> > tracked
> > > on
> > > > > Github under https://github.com/apache/parquet-site. Is the
> > > > documentation
> > > > > content still being actively updated? Has there been an effort to
> > > > > synchronize the format descriptions under apache/parquet-site with
> > > those
> > > > > under apache/parquet-format?
> > > > >
> > > > > Kind regards
> > > > >
> > > > > Kaili
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to