rouault commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068064220
Digging further, seeing that FileReaderImpl::DecodeRowGroups() already
calls Table::Validate(), but that GetRecordBatchReader() didn't, I've also
tested successfully the following alternative patch:
```
diff --git a/cpp/src/parquet/arrow/reader.cc
b/cpp/src/parquet/arrow/reader.cc
index d6ad7c25b..9adb6e2c0 100644
--- a/cpp/src/parquet/arrow/reader.cc
+++ b/cpp/src/parquet/arrow/reader.cc
@@ -1044,6 +1044,7 @@ Status FileReaderImpl::GetRecordBatchReader(const
std::vector<int>& row_groups,
}
auto table = ::arrow::Table::Make(batch_schema, std::move(columns));
+ RETURN_NOT_OK(table->Validate());
auto table_reader =
std::make_shared<::arrow::TableBatchReader>(*table);
// NB: explicitly preserve table so that table_reader doesn't
outlive it
```
With that patch, the error reported is "Column 18 named timestamp_us_no_tz
expected length 5 but got length 2"
I'm not sure which approach is preferred.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]