> With regards to Variant implementations, for Java, don't we need the format > released before the implementation can be provided (I thought parquet-java > consumed a released parquet-format jar in its build)?
For parquet-java, usually the PoC PR is based on a locally built parquet-format with an unreleased version when the spec change is under review. Once the vote has been passed and a new parquet-format is released, the PoC PR gets rebased on the released format for a final review. Below are some examples: float16: https://github.com/apache/parquet-java/pull/1142 size stats: https://github.com/apache/parquet-java/pull/1177 geometry: https://github.com/apache/parquet-java/pull/2971 Best, Gang On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Gene, > > Before release, I added a proposal to have a shredding version added to the > annotation (https://github.com/apache/parquet-format/pull/474), it would > be > good to discuss if people think there is value in this. > > > > > However, there was a discussion [2] on the requirement of two PoC > reference > > implementations when promoting a new format change. > > > With regards to Variant implementations, for Java, don't we need the format > released before the implementation can be provided (I thought parquet-java > consumed a released parquet-format jar in its build)? > > > > However, there was a discussion [2] on the requirement of two PoC > reference > > implementations when promoting a new format change. There are also > concerns > > from the variant logical type PR [3] against parquet-java. This is > > something to > > discuss in the community if we want to make the variant type an > exception. > > > I thought the compromise we came to is that the documentation for Variant > states that it is still experimental (maybe we should add this as a comment > to parquet.thrift as well to make this very clear) . I was under the > impression that Variant would stay experimental until the 2 implementations > were complete. I think we should clarify the scope of what we think is > acceptable for the implementations but that should probably be a separate > thread). I also have some concerns about some current variant spec after > reviewing initial spec and the proposed shredding simplification [1], which > I'll raise on a separate thread. > > Thanks, > Micah > > [1] https://github.com/apache/parquet-format/pull/461 > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <ust...@gmail.com> wrote: > > > Hi Gene, > > > > Thanks for your effort on adding variant type to the parquet-format! For > > the next > > release, I'd like to include the geometry type [1] as well which is also > > targeted > > for the Iceberg V3 spec. I can volunteer to be the release manager. > > > > However, there was a discussion [2] on the requirement of two PoC > reference > > implementations when promoting a new format change. There are also > concerns > > from the variant logical type PR [3] against parquet-java. This is > > something to > > discuss in the community if we want to make the variant type an > exception. > > > > [1] https://github.com/apache/parquet-format/pull/240 > > [2] https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm > > [3] https://github.com/apache/parquet-java/pull/3072 > > > > Best, > > Gang > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <gene.p...@gmail.com> wrote: > > > > > Hi, > > > > > > We updated parquet-format <https://github.com/apache/parquet-format> > to > > > include the Variant logical type annotation. Would someone be able to > > > release parquet-format (and create the necessary artifacts) so that > > > parquet-java can be updated to depend on the new release? This would > > enable > > > adding implementation in parquet-java. > > > > > > Thanks! > > > Gene > > > > > >