Re: [I] Support file row number in Parquet reader [arrow-rs]

via GitHub Mon, 27 Oct 2025 14:06:59 -0700


vustef commented on issue #7299:
URL: https://github.com/apache/arrow-rs/issues/7299#issuecomment-3453340084


   That's a very very exciting change @etseidl.
   
   Though since it is an optional thing that no other writer will write (if I 
interpret it correctly), not sure if we can rely on it. So the comment that you 
had on the PR is still a concern:
   
   > One concern I have with the approach here is how to provide exact row 
numbers if we start selectively reading row group metadata. If we don't have 
metadata for all preceding row groups, we can't know the starting row number. 
This at least argues for reverting back to using an Option for the start index.
   
   So we have to switch dynamically between somewhat slower parsing but the one 
that calculates `first_row_index`, or faster one but which doesn't have that 
field. I'm not a fan of dynamic decision here, because if there's caching of 
decoded data (in future, or now, not sure what exists), it's an extra 
complexity to handle. But it seems it's either that or defaulting to always 
going with decoding all row groups' num rows, which doesn't seem desirable from 
what I gather.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Support file row number in Parquet reader [arrow-rs]

Reply via email to