etseidl commented on PR #9374: URL: https://github.com/apache/arrow-rs/pull/9374#issuecomment-3874031028
I've looked at the code and the test, and I'm not quite sure this is the correct fix. Given the data (`[10, 20], [30, 40], [50, 60], [70, 80]`), I would expect `skip_records(2)` to skip over `[10, 20], [30, 40]` and then return `[50, 60]` for the next row (which is the behavior if all three pages use v1 headers). I think the issue you've identified is that when mixing page header types, we can't simply trust `num_rows` because we've found it on a single page header. I think we instead need to detect mixed v1/v2 page headers and not use the `num_rows` short cut in a mixed environment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
