vustef commented on issue #7299: URL: https://github.com/apache/arrow-rs/issues/7299#issuecomment-3453340084
That's a very very exciting change @etseidl. Though since it is an optional thing that no other writer will write (if I interpret it correctly), not sure if we can rely on it. So the comment that you had on the PR is still a concern: > One concern I have with the approach here is how to provide exact row numbers if we start selectively reading row group metadata. If we don't have metadata for all preceding row groups, we can't know the starting row number. This at least argues for reverting back to using an Option for the start index. So we have to switch dynamically between somewhat slower parsing but the one that calculates `first_row_index`, or faster one but which doesn't have that field. I'm not a fan of dynamic decision here, because if there's caching of decoded data (in future, or now, not sure what exists), it's an extra complexity to handle. But it seems it's either that or defaulting to always going with decoding all row groups' num rows, which doesn't seem desirable from what I gather. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
