Can someone (pretty pretty) please give us some binary examples so we can
make faster progress on the Rust implementation?

We recently got exciting news[1] that folks from the CMU database group
have started working on the Rust implementation of variant, and I would
very much like to encourage and support their work.

I am willing to do some legwork (make a PR to parquet-testing for example)
if someone can point me to the files (or instructions on how to use some
system to create variants).

I was hoping that since the VARIANT format[2] and draft shredding spec[3]
have been in the repo for 6 months (since October 2024) , it would be
straightforward to provide some examples. Do we know anything that is
blocking the creation of examples?

Andrew

[1]: https://github.com/apache/arrow-rs/issues/6736#issuecomment-2781556103
[2]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
[3]:
https://github.com/apache/parquet-format/blob/master/VariantShredding.md


On Wed, Mar 5, 2025 at 3:58 PM Julien Le Dem <jul...@apache.org> wrote:

> That sounds like a great suggestion to me.
>
> On Wed, Mar 5, 2025 at 12:41 PM Andrew Lamb <andrewlam...@gmail.com>
> wrote:
>
> > I would like to request before the VARIANT spec changes are finalized
> that
> > we have example data in parquet-testing.
> >
> > This topic came up (well, I brought it up) on the sync call today.
> >
> > In my opinion, having example files would reduce the overhead of new
> > implementations dramatically. At least there should be example of
> > * variant columns (no shredding)
> > * variant columns with shredding
> >
> > Some description of what those files contained ("expected contents"). For
> > prior art, here is what Dewey did for the geometry type[1][2].
> >
> > When looking for prior discussions, I found a great quote from Gang Wu[3]
> > on this topic:
> >
> > >  I'd say that a lesson learned is that we should publish example files
> > for any
> > > new feature to the parquet-testing [1] repo for interoperability tests.
> >
> > Thank you for your consideration,
> > Andrew
> >
> >
> >
> >
> > [1] https://github.com/apache/parquet-testing/pull/70
> > [2] https://github.com/geoarrow/geoarrow-data
> > [3]: https://lists.apache.org/thread/71d7p9lprhf514jnt5dgnw4wfmn8ykzt
> >
>

Reply via email to