Hi Gang - We could embed the README.md on parquet-format as an iframe on the docsy website (something better than just a url link). It could also be easy just to link. The other option, which seems to be what iceberg does ( https://iceberg.apache.org/docs/latest/ or https://iceberg.apache.org/docs/1.4.3/) is to actually version the entire set of docs and tie it to a version of parquet-mr or parquet-format. I've mostly treated releases as blog posts for now https://parquet.apache.org/blog/, but if that's not the best way to handle versioned docs, we can explore adopting Iceberg's model. <vinoo.gan...@gmail.com>
On Mon, Mar 4, 2024 at 8:50 PM Gang Wu <ust...@gmail.com> wrote: > Hi Vinoo, > > Thanks for the reply! How do you want to embed the README? > Linking it to the parquet-format repo or just copying the whole content? > IMO we might need to make it clear to the users that they know what > version of the format they are looking at. Therefore linking to the > format repo (and maybe add different versions as well) sounds much > better to me. > > Best, > Gang > > On Tue, Mar 5, 2024 at 3:18 AM Vinoo Ganesh <vinoo.gan...@gmail.com> > wrote: > > > Hi All - Sorry I missed this email chain. I've been mostly responsible > > for building the infrastructure around the new parquet-site website, but > > have mostly left the existing content alone. I'm happy to just link to > the > > parquet-format repo, but that would mean the content is no longer > > searchable from the website, and users would have to first find the link > to > > the parquet-format repo from the docs and then navigate there. > > > > I could just embed the parquet-format README in an iframe on the spec > docs. > > Alternatively, as part of the release actions, we can add a task that > opens > > an issue on parquet-site for update. > > > > Do people have thoughts / opinions on these two? > > > > On Thu, Jan 18, 2024 at 1:33 PM Kaili Zhang <kaili...@hotmail.com> > wrote: > > > > > Hi Gabor > > > > > > I am OK with that. As long as the information is up-to-date, whatever > > > method most convenient for the devs will do. > > > > > > Kind regards > > > > > > Kaili > > > > > > ________________________________ > > > From: Gábor Szádovszky <ga...@apache.org> > > > Sent: Monday, January 15, 2024 12:25:39 AM > > > To: dev@parquet.apache.org <dev@parquet.apache.org> > > > Subject: Re: Discrepancy in parquet format documentation > > > > > > Hey Gang, Kaili, > > > > > > I think the easiest way to solve this issue is to completely remove the > > > spec from the site and add a reference to the parquet-format repo > > instead. > > > We should probably add the release tag links when we make a release of > > > parquet-format with a "latest" link. This way we would also avoid > > potential > > > issues when someone would make decisions based on un-released spec > > changes. > > > > > > Cheers, > > > Gabor > > > > > > Kaili Zhang <kaili...@hotmail.com> ezt írta (időpont: 2024. jan. 13., > > Szo, > > > 20:53): > > > > > > > Hi Gang > > > > > > > > Thank you for looking into this. Updating the description on > > > > parquet.apache.org will save everyone searching for this > information a > > > > few hours of head scratching. It is unfortunate that the slightly > > > > out-of-date spec features more prominently in Google results. > > > > > > > > Kind regards > > > > > > > > Kaili > > > > ________________________________ > > > > From: Gang Wu <ust...@gmail.com> > > > > Sent: Tuesday, January 9, 2024 5:56 PM > > > > To: dev@parquet.apache.org <dev@parquet.apache.org> > > > > Subject: Re: Discrepancy in parquet format documentation > > > > > > > > Hi Kaili, > > > > > > > > You're right. Please refer to the parquet-format repo for specs. The > > site > > > > is unfortunately out of sync for a long time and there isn't any > > > automatic > > > > process to update it. Let me update the site manually to be in sync > > with > > > > the latest format release. > > > > > > > > Best, > > > > Gang > > > > > > > > On Sun, Jan 7, 2024 at 8:03 AM Kaili Zhang <kaili...@hotmail.com> > > wrote: > > > > > > > > > Hi all > > > > > > > > > > I found this page via Google when searching for a description of > the > > > > > parquet binary format: > > > > > https://parquet.apache.org/docs/file-format/data-pages/. This page > > > > > suggests that definition levels are written before repetition > levels. > > > > > > > > > > However, after experimenting with parquet files generated by pandas > > and > > > > > pyarrow and perusing the arrow source code (especially > > > > > InitializeLevelDecoders in > > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc > > > > ), > > > > > I strongly believe that repetition levels are written before > > definition > > > > > levels. I also found this other documentation of parquet format > that > > > has > > > > > repetition levels before definition levels > > > > > https://github.com/apache/parquet-format. > > > > > > > > > > The content of the parquet.apache.org/docs site appears to be > > tracked > > > on > > > > > Github under https://github.com/apache/parquet-site. Is the > > > > documentation > > > > > content still being actively updated? Has there been an effort to > > > > > synchronize the format descriptions under apache/parquet-site with > > > those > > > > > under apache/parquet-format? > > > > > > > > > > Kind regards > > > > > > > > > > Kaili > > > > > > > > > > > > > > > > > > > >