jonded94 commented on PR #9374:
URL: https://github.com/apache/arrow-rs/pull/9374#issuecomment-3876339801
Hey @etseidl :wave:
Thanks for your review! I'm not super well versed around data page decoding
but I agree that after skipping 2 records, `[50, 60]` probably should be the
next decoded record.. I'm unfortunately super busy with internal bugfixes right
now, I just plugged that back into the AI slop machine and it came out with a
fix (d738d35) slightly different from your suggestion that *does* correctly
return `[50, 60]`:
```
The corrected fix (addressing reviewer @etseidl's feedback):
- Before: flush_partial() was called at the end of skip_records after all
records were skipped — this only cleared the stale has_partial state but didn't
fix the over-counting (3 records skipped instead
of 2).
- After: flush_partial() is called before the num_rows whole-page-skip
shortcut. This properly counts the pending partial record from the previous
page, adjusts remaining_records (2→1), and prevents the
v2 page from being entirely skipped (since rows=2 > remaining=1). The
skip then falls through to level-based decoding, correctly skipping only 1
record from page 2.
The corrected test asserts that after skip_records(2) over 4 records
[10,20], [30,40], [50,60], [70,80], read_records(1) returns [50, 60] (the 3rd
record), not [70, 80] (the 4th).
```
I'm really sorry I can't invest more time into that right now, but I hope
what I committed is now finally correct!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]