thinkharderdev commented on code in PR #2552:
URL: https://github.com/apache/arrow-rs/pull/2552#discussion_r951378549
##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -135,7 +135,11 @@ where
loop {
// Try to find some records from buffers that has been read into
memory
// but not counted as seen records.
- let end_of_column =
!self.column_reader.as_mut().unwrap().has_next()?;
+
+ // Check to see if the column is exhausted. Only peek the next
page since in
+ // case we are reading to a page boundary and do not actually need
to read
+ // the next page.
+ let end_of_column =
!self.column_reader.as_mut().unwrap().peek_next()?;
Review Comment:
I think this should ultimately get cleaned up a bit. It is confusing since
we need to `peek_next` but also call `has_next` below (since the next page
needs to get loaded). It feels like the page-level logic wants to be
encapsulated inside `GenericColumnReader` but it inevitable leaks out in places
like this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]