I agree on making naming consistent. Row is a good choice.
Also, agree on thrift ids only are part of the spec as that’s what ends up
in the binary.

On Fri, May 31, 2024 at 13:41 Andrew Lamb <andrewlam...@gmail.com> wrote:

> I filed a JIRA[1] and a PR[2] to change parquet.thrift to use "row"
>
> Thanks,
> Andrew
>
> [1]: https://issues.apache.org/jira/browse/PARQUET-2488
> [2]: https://github.com/apache/parquet-format/pull/256
>
> On Wed, May 29, 2024 at 8:45 AM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > I agree that "row" is a more widespread terminology while "record" can
> > be a bit head-scratching.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > On Wed, 29 May 2024 05:49:22 -0400
> > Andrew Lamb <andrewlam...@gmail.com>
> > wrote:
> > > In the context of my PR trying to encode the consensus that records
> can't
> > > span page boundaries[1], Antoine brought up the excellent point[2] that
> > the
> > > format[3] seems to use the terms "records" and "rows" to refer to the
> > same
> > > concept.
> > >
> > > I agree it would clarify the spec to use the same terminology
> throughout.
> > > Given there are several fields named `num_rows` I propose changing
> > > parquet.thrift to use the term "row" throughout.
> > >
> > > I can make another PR to do so if this seems like a good idea.
> > >
> > > Andrew
> > > (p.s the PR[1] is still waiting on some more review and merging :pray:)
> > >
> > > [1] https://github.com/apache/parquet-format/pull/244
> > > [2]
> > https://github.com/apache/parquet-format/pull/244#discussion_r1617320495
> > > [3]
> > >
> >
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
> > >
> >
> >
> >
> >
>

Reply via email to