I feel like it's reasonable to put the specification in the
'parquet-format' repo and reduce the confusion that would be caused by
having specs split across repos.

As for the implementations, we already know there will be multiple and some
will be in languages where there is no current repo in the parquet
project.  I agree with the proposed approach of a 'parquet-variant' project
where we keep all of the different language implementations.  There are a
number of benefits including keeping implementations more consistent and
having a single place for reviewers/maintainers to focus their attention
while the initial donation/implementation progresses.  It's easier to
split out an implementation if necessary than combine them and given the
relatively small size of this feature, it may never be an issue.

Another thing to consider is that a lot of projects have custom
implementations of a parquet read/write path and requiring that they add a
dependency to parquet-java or arrow-rs to get variant support, for example,
feels like it would just cause more fragmentation across implementations as
they may choose to build their own.  I feel like the fastest path to
general adoption is to keep the implementation separate so that we can rely
on reuse as much as possible.

-Dan



On Tue, Sep 10, 2024 at 2:29 AM Andrew Lamb <andrewlam...@gmail.com> wrote:

> From a Rust perspective, I think putting the spec in the parquet-format
> repo makes sense as it will become part of the parquet spec.
>
> In terms of what repository the rust variant implementation would live in:
> * if there are parquet committers who plan to help implement and maintain
> it, then putting it in parquet-variant could make sense
> * if the idea is that the existing parquet-rs maintainers would help
> maintain it, putting it in the existing `arrow-rs` repo makes more sense to
> me (this would likely also make initial development easier)
>
> Technically I would expect the rust implementation to be its own "crate"
> (equivalent of a library) that is released separately, that the parquet
> crate depended on but not the other way around.
>
> Hope that helps,
> Andrew
>
> On Tue, Sep 10, 2024 at 12:33 AM Gene Pang <gene.p...@gmail.com> wrote:
>
> > Hi all,
> >
> > The Spark community has agreed
> > <https://lists.apache.org/thread/pkybo148j6qyn2wsjnmyrhqs3crn9b89> to
> move
> > the Variant specification and implementation to the Parquet project.
> >
> > However, there are several details we need to figure out with the move to
> > Parquet. I have started a document with some of the topics and details we
> > need to finalize.
> >
> >
> >
> https://docs.google.com/document/d/1guEzBQjzOEEZvvibeZjNraKmZHWtxQR95O_DvtZU0xw/edit?usp=sharing
> >
> > Please take a look at the document and leave comments, questions and
> > feedback to help reach a conclusion.
> >
> > Thanks,
> > Gene
> >
>

Reply via email to