Can someone (pretty pretty) please give us some binary examples so we can make faster progress on the Rust implementation?
We recently got exciting news[1] that folks from the CMU database group have started working on the Rust implementation of variant, and I would very much like to encourage and support their work. I am willing to do some legwork (make a PR to parquet-testing for example) if someone can point me to the files (or instructions on how to use some system to create variants). I was hoping that since the VARIANT format[2] and draft shredding spec[3] have been in the repo for 6 months (since October 2024) , it would be straightforward to provide some examples. Do we know anything that is blocking the creation of examples? Andrew [1]: https://github.com/apache/arrow-rs/issues/6736#issuecomment-2781556103 [2]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md [3]: https://github.com/apache/parquet-format/blob/master/VariantShredding.md On Wed, Mar 5, 2025 at 3:58 PM Julien Le Dem <jul...@apache.org> wrote: > That sounds like a great suggestion to me. > > On Wed, Mar 5, 2025 at 12:41 PM Andrew Lamb <andrewlam...@gmail.com> > wrote: > > > I would like to request before the VARIANT spec changes are finalized > that > > we have example data in parquet-testing. > > > > This topic came up (well, I brought it up) on the sync call today. > > > > In my opinion, having example files would reduce the overhead of new > > implementations dramatically. At least there should be example of > > * variant columns (no shredding) > > * variant columns with shredding > > > > Some description of what those files contained ("expected contents"). For > > prior art, here is what Dewey did for the geometry type[1][2]. > > > > When looking for prior discussions, I found a great quote from Gang Wu[3] > > on this topic: > > > > > I'd say that a lesson learned is that we should publish example files > > for any > > > new feature to the parquet-testing [1] repo for interoperability tests. > > > > Thank you for your consideration, > > Andrew > > > > > > > > > > [1] https://github.com/apache/parquet-testing/pull/70 > > [2] https://github.com/geoarrow/geoarrow-data > > [3]: https://lists.apache.org/thread/71d7p9lprhf514jnt5dgnw4wfmn8ykzt > > >