I like the idea of machine generated compatibility tests, as this makes it easier to keep that page up to date.
An initial one with a date stamp manually generated would be good though. There's also things like, this, which looks less a data marshalling and more a column size reporting interop issue with data generated by R-lang's nanoparquet https://github.com/apache/parquet-java/issues/3043 On Wed, 20 Nov 2024 at 21:38, Antoine Pitrou <anto...@python.org> wrote: > > Hi Andrew, > > On Tue, 19 Nov 2024 16:05:16 -0500 > Andrew Lamb <andrewlam...@gmail.com> > wrote: > > This is entirely an ecosystem / people / momentum problem in my opinion > > (not a technical one) > > +1. > > > I have some thoughts [3] on how to help (compatibility matrix, define > what > > "compatible" means, etc) but I haven't been able to get people excited > > about it 🤷) > > Well, I do think doing this would be important for the project. > > Also, I think most people here are unaware that > > https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md > actually awaits contributors to populate it :-) I just noticed it > myself. So perhaps opening GH issues (one per implementation?) and > pinging the relevant people would get things going? > > Last thing: once we have a compatibility matrix like Arrow does, what > could be useful as well would be to define calendar-based "profiles". > Instead of tediously enabling individual options in your Parquet > writer, you would select "Parquet profile 2023.4" and that would mean > "enable all features that were present in major implementations in > April 1st 2023". > > Regards > > Antoine. > > >