[GitHub] [arrow] tachyonwill commented on a change in pull request #11984: PARQUET-2109: [C++] Check if Parquet page has too few values

GitBox Thu, 06 Jan 2022 09:26:15 -0800


tachyonwill commented on a change in pull request #11984:
URL: https://github.com/apache/arrow/pull/11984#discussion_r779717505




##########
File path: cpp/src/parquet/column_reader.cc
##########
@@ -970,6 +970,9 @@ int64_t 
TypedColumnReaderImpl<DType>::ReadBatchWithDictionary(
   // Read dictionary indices.
   *indices_read = ReadDictionaryIndices(indices_to_read, indices);
   int64_t total_indices = std::max(num_def_levels, *indices_read);
+  if (total_indices == 0 && batch_size != 0) {
+    ParquetException::EofException("Read 0 values");

Review comment:
       The PR doesn't change the behavior on length 0 pages(assuming the page 
is correctly formed). At the start of the ReadBatch* methods, HasNext() is 
called and we gracefully bail out if it returns false. Size 0 pages will cause 
HasNext() to return false, hence we stop. Is this the right thing to do? I 
don't know. It can cause weird behavior and looking at some parquet-mr JIRAs, 
size 0 pages might not be entirely legal.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] tachyonwill commented on a change in pull request #11984: PARQUET-2109: [C++] Check if Parquet page has too few values

Reply via email to