I feel like it's reasonable to put the specification in the 'parquet-format' repo and reduce the confusion that would be caused by having specs split across repos.
As for the implementations, we already know there will be multiple and some will be in languages where there is no current repo in the parquet project. I agree with the proposed approach of a 'parquet-variant' project where we keep all of the different language implementations. There are a number of benefits including keeping implementations more consistent and having a single place for reviewers/maintainers to focus their attention while the initial donation/implementation progresses. It's easier to split out an implementation if necessary than combine them and given the relatively small size of this feature, it may never be an issue. Another thing to consider is that a lot of projects have custom implementations of a parquet read/write path and requiring that they add a dependency to parquet-java or arrow-rs to get variant support, for example, feels like it would just cause more fragmentation across implementations as they may choose to build their own. I feel like the fastest path to general adoption is to keep the implementation separate so that we can rely on reuse as much as possible. -Dan On Tue, Sep 10, 2024 at 2:29 AM Andrew Lamb <andrewlam...@gmail.com> wrote: > From a Rust perspective, I think putting the spec in the parquet-format > repo makes sense as it will become part of the parquet spec. > > In terms of what repository the rust variant implementation would live in: > * if there are parquet committers who plan to help implement and maintain > it, then putting it in parquet-variant could make sense > * if the idea is that the existing parquet-rs maintainers would help > maintain it, putting it in the existing `arrow-rs` repo makes more sense to > me (this would likely also make initial development easier) > > Technically I would expect the rust implementation to be its own "crate" > (equivalent of a library) that is released separately, that the parquet > crate depended on but not the other way around. > > Hope that helps, > Andrew > > On Tue, Sep 10, 2024 at 12:33 AM Gene Pang <gene.p...@gmail.com> wrote: > > > Hi all, > > > > The Spark community has agreed > > <https://lists.apache.org/thread/pkybo148j6qyn2wsjnmyrhqs3crn9b89> to > move > > the Variant specification and implementation to the Parquet project. > > > > However, there are several details we need to figure out with the move to > > Parquet. I have started a document with some of the topics and details we > > need to finalize. > > > > > > > https://docs.google.com/document/d/1guEzBQjzOEEZvvibeZjNraKmZHWtxQR95O_DvtZU0xw/edit?usp=sharing > > > > Please take a look at the document and leave comments, questions and > > feedback to help reach a conclusion. > > > > Thanks, > > Gene > > >