thinkharderdev commented on code in PR #2552:
URL: https://github.com/apache/arrow-rs/pull/2552#discussion_r951378549


##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -135,7 +135,11 @@ where
         loop {
             // Try to find some records from buffers that has been read into 
memory
             // but not counted as seen records.
-            let end_of_column = 
!self.column_reader.as_mut().unwrap().has_next()?;
+
+            // Check to see if the column is exhausted. Only peek the next 
page since in
+            // case we are reading to a page boundary and do not actually need 
to read
+            // the next page.
+            let end_of_column = 
!self.column_reader.as_mut().unwrap().peek_next()?;

Review Comment:
   I think this should ultimately get cleaned up a bit. It is confusing since 
we need to `peek_next` but also call `has_next` below (since the next page 
needs to get loaded). It feels like the page-level logic wants to be 
encapsulated inside `GenericColumnReader` but it inevitable leaks out in places 
like this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to