fatemah created PARQUET-2175:
--------------------------------
Summary: Skip method skips levels and not rows for repeated fields
Key: PARQUET-2175
URL: https://issues.apache.org/jira/browse/PARQUET-2175
Project: Parquet
Issue Type: Bug
Components: parquet-cpp
Reporter: fatemah
The implementation of TypedColumnReader::Skip method with signature:
virtual int64_t Skip(int64_t num_levels_to_skip) = 0;
will skip levels for both repeated fields and non-repeated fields. We want to
be able to skip rows for repeated fields, and skipping levels is not that
useful.
For example, for the following rows:
message M \{ repeated int32 b = 1 }
rows: {}, \{[10,10]}, \{[20, 20, 20]}
values = \{10, 10, 20, 20, 20};
def_levels = \{0, 1, 1, 1, 1, 1};
rep_levels = \{0, 0, 1, 0, 1, 1};
We want skip(2) to skip the first two rows, so that the next value that we read
is 20. However, it will skip the first two levels, and the next value that we
read is 10.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)