tustvold commented on code in PR #2027:
URL: https://github.com/apache/arrow-rs/pull/2027#discussion_r916840405


##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -193,10 +180,7 @@ where
             };
 
             // Try to more value from parquet pages
-            let values_read = self.read_one_batch(batch_size)?;
-            if values_read < batch_size {

Review Comment:
   This is the cause of the bug, the end of the column chunk was being detected 
based on if read_on_batch returned less than the batch size. If the batch_size 
happens to exactly match the remaining records this would fail, and the reader 
would miss the last record in the chunk



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to