rouault commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068064220

   Digging further,  seeing that FileReaderImpl::DecodeRowGroups() already 
calls Table::Validate(), but that GetRecordBatchReader() didn't, I've also 
tested successfully the following alternative patch:
   
   ```
   diff --git a/cpp/src/parquet/arrow/reader.cc 
b/cpp/src/parquet/arrow/reader.cc
   index d6ad7c25b..9adb6e2c0 100644
   --- a/cpp/src/parquet/arrow/reader.cc
   +++ b/cpp/src/parquet/arrow/reader.cc
   @@ -1044,6 +1044,7 @@ Status FileReaderImpl::GetRecordBatchReader(const 
std::vector<int>& row_groups,
            }
    
            auto table = ::arrow::Table::Make(batch_schema, std::move(columns));
   +        RETURN_NOT_OK(table->Validate());
            auto table_reader = 
std::make_shared<::arrow::TableBatchReader>(*table);
    
            // NB: explicitly preserve table so that table_reader doesn't 
outlive it
   ```
   
   With that patch, the error reported is "Column 18 named timestamp_us_no_tz 
expected length 5 but got length 2"
   
   I'm not sure which approach is preferred.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to