jonded94 commented on code in PR #9374:
URL: https://github.com/apache/arrow-rs/pull/9374#discussion_r2805545515


##########
parquet/src/column/reader.rs:
##########
@@ -309,6 +309,20 @@ where
                 });
 
                 if let Some(rows) = rows {
+                    // If there is a pending partial record from a previous 
page,
+                    // count it before considering the whole-page skip. When 
the
+                    // next page provides num_rows (e.g. a V2 data page or via
+                    // offset index), its records are self-contained, so the
+                    // partial from the previous page is complete at this 
boundary.
+                    if let Some(decoder) = self.rep_level_decoder.as_mut() {
+                        if decoder.flush_partial() {

Review Comment:
   > IMO enough time has probably passed that we can just assume that records 
aren't split across pages
   
   Datapoint: The file that lead to the original error message was written with 
`arrow-rs` version 57.1.0: [parquet 
viewer](https://private-user-images.githubusercontent.com/30271979/546253888-ec67ea13-1ead-4430-af64-041773c38ecc.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEwMDc2NzEsIm5iZiI6MTc3MTAwNzM3MSwicGF0aCI6Ii8zMDI3MTk3OS81NDYyNTM4ODgtZWM2N2VhMTMtMWVhZC00NDMwLWFmNjQtMDQxNzczYzM4ZWNjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjEzVDE4MjkzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZjZGRkY2MzM2Y1MTA2ZjdhNzE3ZmEyN2Q2ZmQ3NWM1MmE1ODZlYTg4N2RiYWE1YzRjODE3M2Y4ZjM2MzVlODgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.5Iz9mldg3Z7bQCkvSEaEe7HYRjmdBYU7xTNnXI_tfM0)
 and [more verbose debug 
dumps](https://github.com/apache/arrow-rs/issues/9370#issuecomment-388
 9847488). In any case, at least at my company we probably have a few PiB of 
data written with this or an even earlier version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to