tachyonwill commented on a change in pull request #11984:
URL: https://github.com/apache/arrow/pull/11984#discussion_r779717505
##########
File path: cpp/src/parquet/column_reader.cc
##########
@@ -970,6 +970,9 @@ int64_t
TypedColumnReaderImpl<DType>::ReadBatchWithDictionary(
// Read dictionary indices.
*indices_read = ReadDictionaryIndices(indices_to_read, indices);
int64_t total_indices = std::max(num_def_levels, *indices_read);
+ if (total_indices == 0 && batch_size != 0) {
+ ParquetException::EofException("Read 0 values");
Review comment:
The PR doesn't change the behavior on length 0 pages(assuming the page
is correctly formed). At the start of the ReadBatch* methods, HasNext() is
called and we gracefully bail out if it returns false. Size 0 pages will cause
HasNext() to return false, hence we stop. Is this the right thing to do? I
don't know. It can cause weird behavior and looking at some parquet-mr JIRAs,
size 0 pages might not be entirely legal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]