Hi,

I agree that row sounds clearer than record, however we have a class
RecordReader in the parquet cpp: [1]. Not sure if we need to rename
it and it is still considered an internal class.

[1]
https://github.com/apache/arrow/blob/4a2df663bc88c73b863e0c0036160f7f936574c2/cpp/src/parquet/column_reader.h#L312

Best,
Gang

On Wed, May 29, 2024 at 8:44 PM Antoine Pitrou <anto...@python.org> wrote:

>
> I agree that "row" is a more widespread terminology while "record" can
> be a bit head-scratching.
>
> Regards
>
> Antoine.
>
>
> On Wed, 29 May 2024 05:49:22 -0400
> Andrew Lamb <andrewlam...@gmail.com>
> wrote:
> > In the context of my PR trying to encode the consensus that records can't
> > span page boundaries[1], Antoine brought up the excellent point[2] that
> the
> > format[3] seems to use the terms "records" and "rows" to refer to the
> same
> > concept.
> >
> > I agree it would clarify the spec to use the same terminology throughout.
> > Given there are several fields named `num_rows` I propose changing
> > parquet.thrift to use the term "row" throughout.
> >
> > I can make another PR to do so if this seems like a good idea.
> >
> > Andrew
> > (p.s the PR[1] is still waiting on some more review and merging :pray:)
> >
> > [1] https://github.com/apache/parquet-format/pull/244
> > [2]
> https://github.com/apache/parquet-format/pull/244#discussion_r1617320495
> > [3]
> >
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
> >
>
>
>
>

Reply via email to