In the V2 data page header, we have:

* num_values
* num_rows
* num_nulls

While on the V1 data page header, we only have "num_values".

On a page representing a list, e.g. [[0, 1], None, [2, None, 3]], how
should each of these numbers be written in v1 and v2?

My current understanding from the docs is that for the example above, we
should write:

v2:
* num_values: 6
* num_rows: 3
* num_nulls: 2

v1:
* num_values: 6

But I am not sure this is correct. For example, pyarrow==4.0.0 writes

v2:
* num_values: 6
* num_nulls: 1
* num_rows: 6
v1:
* num_values: 6

Is there any reference for this?

Are the extra numbers in v2 necessary to read a page? My understanding is
that the (compressed_size, uncompressed_size, num_values) is enough for
reading everything.

Best,
Jorge

Reply via email to