tachyonwill commented on a change in pull request #11984:
URL: https://github.com/apache/arrow/pull/11984#discussion_r779722792
##########
File path: cpp/src/parquet/column_reader.cc
##########
@@ -993,6 +996,9 @@ int64_t TypedColumnReaderImpl<DType>::ReadBatch(int64_t
batch_size, int16_t* def
*values_read = this->ReadValues(values_to_read, values);
int64_t total_values = std::max(num_def_levels, *values_read);
+ if (total_values == 0) {
+ ParquetException::EofException("Read 0 values");
Review comment:
I would expect that we should see min(batch_size, num_buffered_values_ -
num_decoded_values_) values. I will add this to the error message.
I don't entirely understand your second question. If a page has an
inconsistent number of values from what is in the header, bad things can happen
This PR is addressing one of those scenarios, where the reported number is too
many causing an infinite loop. There are other scenarios however that I am not
addressing. For example, if the reported number of values is lower than what
actually exists, we can drop them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]