I filed a JIRA[1] and a PR[2] to change parquet.thrift to use "row"
Thanks, Andrew [1]: https://issues.apache.org/jira/browse/PARQUET-2488 [2]: https://github.com/apache/parquet-format/pull/256 On Wed, May 29, 2024 at 8:45 AM Antoine Pitrou <anto...@python.org> wrote: > > I agree that "row" is a more widespread terminology while "record" can > be a bit head-scratching. > > Regards > > Antoine. > > > On Wed, 29 May 2024 05:49:22 -0400 > Andrew Lamb <andrewlam...@gmail.com> > wrote: > > In the context of my PR trying to encode the consensus that records can't > > span page boundaries[1], Antoine brought up the excellent point[2] that > the > > format[3] seems to use the terms "records" and "rows" to refer to the > same > > concept. > > > > I agree it would clarify the spec to use the same terminology throughout. > > Given there are several fields named `num_rows` I propose changing > > parquet.thrift to use the term "row" throughout. > > > > I can make another PR to do so if this seems like a good idea. > > > > Andrew > > (p.s the PR[1] is still waiting on some more review and merging :pray:) > > > > [1] https://github.com/apache/parquet-format/pull/244 > > [2] > https://github.com/apache/parquet-format/pull/244#discussion_r1617320495 > > [3] > > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift > > > > > >