> With regards to Variant implementations, for Java, don't we need the
format
> released before the implementation can be provided (I thought parquet-java
> consumed a released parquet-format jar in its build)?

For parquet-java, usually the PoC PR is based on a locally built
parquet-format
with an unreleased version when the spec change is under review. Once the
vote
has been passed and a new parquet-format is released, the PoC PR gets
rebased
on the released format for a final review. Below are some examples:

float16: https://github.com/apache/parquet-java/pull/1142
size stats: https://github.com/apache/parquet-java/pull/1177
geometry: https://github.com/apache/parquet-java/pull/2971

Best,
Gang

On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Gene,
>
> Before release, I added a proposal to have a shredding version added to the
> annotation (https://github.com/apache/parquet-format/pull/474), it would
> be
> good to discuss if people think there is value in this.
>
>
>
> > However, there was a discussion [2] on the requirement of two PoC
> reference
> > implementations when promoting a new format change.
>
>
> With regards to Variant implementations, for Java, don't we need the format
> released before the implementation can be provided (I thought parquet-java
> consumed a released parquet-format jar in its build)?
>
>
> > However, there was a discussion [2] on the requirement of two PoC
> reference
> > implementations when promoting a new format change. There are also
> concerns
> > from the variant logical type PR [3] against parquet-java. This is
> > something to
> > discuss in the community if we want to make the variant type an
> exception.
>
>
> I thought the compromise we came to is that the documentation  for Variant
> states that it is still experimental (maybe we should add this as a comment
> to parquet.thrift as well to make this very clear) . I was under the
> impression that Variant would stay experimental until the 2 implementations
> were complete.  I think we should clarify the scope of what we think is
> acceptable for the implementations but that should probably be a separate
> thread).  I also have some concerns about some current variant spec after
> reviewing initial spec and the proposed shredding simplification [1], which
> I'll raise on a separate thread.
>
> Thanks,
> Micah
>
> [1] https://github.com/apache/parquet-format/pull/461
>
>
>
> On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <ust...@gmail.com> wrote:
>
> > Hi Gene,
> >
> > Thanks for your effort on adding variant type to the parquet-format! For
> > the next
> > release, I'd like to include the geometry type [1] as well which is also
> > targeted
> > for the Iceberg V3 spec. I can volunteer to be the release manager.
> >
> > However, there was a discussion [2] on the requirement of two PoC
> reference
> > implementations when promoting a new format change. There are also
> concerns
> > from the variant logical type PR [3] against parquet-java. This is
> > something to
> > discuss in the community if we want to make the variant type an
> exception.
> >
> > [1] https://github.com/apache/parquet-format/pull/240
> > [2] https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm
> > [3] https://github.com/apache/parquet-java/pull/3072
> >
> > Best,
> > Gang
> >
> > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <gene.p...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We updated parquet-format <https://github.com/apache/parquet-format>
> to
> > > include the Variant logical type annotation. Would someone be able to
> > > release parquet-format (and create the necessary artifacts) so that
> > > parquet-java can be updated to depend on the new release? This would
> > enable
> > > adding implementation in parquet-java.
> > >
> > > Thanks!
> > > Gene
> > >
> >
>

Reply via email to