This is an automated email from the ASF dual-hosted git repository.
uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-cpp.git
The following commit(s) were added to refs/heads/master by this push:
new aa7a5e5 PARQUET-1272: Return correct row count for nested columns in
ScanFileContents
aa7a5e5 is described below
commit aa7a5e5f34f2eada56e5d2ae896d85fe2a139747
Author: Korn, Uwe <[email protected]>
AuthorDate: Wed Apr 18 13:04:26 2018 +0200
PARQUET-1272: Return correct row count for nested columns in
ScanFileContents
Stumbled over this while adding lists to the `alltypes_sample` in
`test_parquet.py` in Arrow.
Author: Korn, Uwe <[email protected]>
Closes #457 from xhochy/PARQUET-1272 and squashes the following commits:
45efe1c [Korn, Uwe] PARQUET-1272: Return correct row count for nested
columns in ScanFileContents
---
src/parquet/file_reader.cc | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/parquet/file_reader.cc b/src/parquet/file_reader.cc
index 983d2d0..0632872 100644
--- a/src/parquet/file_reader.cc
+++ b/src/parquet/file_reader.cc
@@ -347,9 +347,18 @@ int64_t ScanFileContents(std::vector<int> columns, const
int32_t column_batch_si
int64_t values_read = 0;
while (col_reader->HasNext()) {
- total_rows[col] +=
+ int64_t levels_read =
ScanAllValues(column_batch_size, def_levels.data(),
rep_levels.data(),
values.data(), &values_read, col_reader.get());
+ if (col_reader->descr()->max_repetition_level() > 0) {
+ for (int64_t i = 0; i < levels_read; i++) {
+ if (rep_levels[i] == 0) {
+ total_rows[col]++;
+ }
+ }
+ } else {
+ total_rows[col] += levels_read;
+ }
}
col++;
}
--
To stop receiving notification emails like this one, please contact
[email protected].