etseidl commented on code in PR #9374:
URL: https://github.com/apache/arrow-rs/pull/9374#discussion_r2788640422
##########
parquet/src/column/reader.rs:
##########
@@ -378,6 +392,7 @@ where
));
}
}
+
Review Comment:
```suggestion
```
clean up leftovers :)
##########
parquet/src/column/reader.rs:
##########
@@ -309,6 +309,20 @@ where
});
if let Some(rows) = rows {
+ // If there is a pending partial record from a previous
page,
+ // count it before considering the whole-page skip. When
the
+ // next page provides num_rows (e.g. a V2 data page or via
+ // offset index), its records are self-contained, so the
+ // partial from the previous page is complete at this
boundary.
+ if let Some(decoder) = self.rep_level_decoder.as_mut() {
+ if decoder.flush_partial() {
Review Comment:
I think this is correct. If all pages are V2, then `has_partial` will never
be `true`, because V2 pages must start on a new record (R=0). If all pages are
V1 this will never trigger because `num_rows` will be `None`. This case only
applies when switching from V1 to V2, in which case it's appropriate to call
`flush_partial` because, as said above, V2 pages must start at a new record.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]