etseidl commented on issue #6310: URL: https://github.com/apache/arrow-rs/issues/6310#issuecomment-2311264542
I think the relevant code is https://github.com/apache/arrow-rs/blob/ee2f75a66278dbd3e7aa6b85b5322951c792a58d/parquet/src/column/writer/mod.rs#L752-L779. For the final page (with 30 values), `null_page` should be false, and we should wind up at https://github.com/apache/arrow-rs/blob/ee2f75a66278dbd3e7aa6b85b5322951c792a58d/parquet/src/column/writer/mod.rs#L811-L816 The chunk statistics look ok (min 1, max 1), so you'd think the page stats would similarly be ok. They are created here https://github.com/apache/arrow-rs/blob/ee2f75a66278dbd3e7aa6b85b5322951c792a58d/parquet/src/column/writer/mod.rs#L889-L902 Again, if the min/max were invalid in the page, then you'd expect garbage in the chunk stats. Perhaps some print statements or breakpoints would help here. If the original file isn't sensitive could you share it here? cc @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
