HippoBaro commented on PR #9697: URL: https://github.com/apache/arrow-rs/pull/9697#issuecomment-4310847578
@alamb @etseidl @nathanb9 I have a local WIP branch that I think gets us into a better place, but every approach I’ve tried so far adds a fair amount of complexity around deciding when a row group can be safely released. > I also worry it will be complicated to track the exact ranges needed/not needed, and it adds a new non trivial constraint on the decoder to do range tracking. I agree we should not try to be more granular than the row-group level right now, but even at that level the logic is still fairly subtle because row groups can span discontiguous byte ranges. The core difficulty is that deciding whether a row group can be released depends on queued row groups, the remaining selection, the running offset/limit budget, and whether predicates require the decoder to stay conservative. Today that state is split across multiple components, which makes the release policy hard to centralize cleanly. So, rather than keep layering that complexity directly into this PR, I started a separate refactor in #9804 to simplify that part of the decoder first. Once that work has been reviewed and merged, I plan to come back to this PR on top of it, which should make the buffer-release changes much simpler and easier to reason about. Let me know if that plan sounds reasonable! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
