I like the idea of machine generated compatibility tests, as this makes it
easier to keep that page up to date.

An initial one with a date stamp manually generated would be good though.

There's also things like, this, which looks less a data marshalling and
more a column size reporting interop issue with data generated by R-lang's
nanoparquet
https://github.com/apache/parquet-java/issues/3043

On Wed, 20 Nov 2024 at 21:38, Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Andrew,
>
> On Tue, 19 Nov 2024 16:05:16 -0500
> Andrew Lamb <andrewlam...@gmail.com>
> wrote:
> > This is entirely an ecosystem / people / momentum problem in my opinion
> > (not a technical one)
>
> +1.
>
> > I have some thoughts [3] on how to help (compatibility matrix, define
> what
> > "compatible" means, etc) but I haven't been able to get people excited
> > about it 🤷)
>
> Well, I do think doing this would be important for the project.
>
> Also, I think most people here are unaware that
>
> https://github.com/apache/parquet-site/blob/production/content/en/docs/File%20Format/implementationstatus.md
> actually awaits contributors to populate it :-) I just noticed it
> myself. So perhaps opening GH issues (one per implementation?) and
> pinging the relevant people would get things going?
>
> Last thing: once we have a compatibility matrix like Arrow does, what
> could be useful as well would be to define calendar-based "profiles".
> Instead of tediously enabling individual options in your Parquet
> writer, you would select "Parquet profile 2023.4" and that would mean
> "enable all features that were present in major implementations in
> April 1st 2023".
>
> Regards
>
> Antoine.
>
>
>

Reply via email to