HippoBaro commented on PR #9697:
URL: https://github.com/apache/arrow-rs/pull/9697#issuecomment-4310847578

   @alamb @etseidl @nathanb9 I have a local WIP branch that I think gets us 
into a better place, but every approach I’ve tried so far adds a fair amount of 
complexity around deciding when a row group can be safely released.
   
   > I also worry it will be complicated to track the exact ranges needed/not 
needed, and it adds a new non trivial constraint on the decoder to do range 
tracking.
   
   I agree we should not try to be more granular than the row-group level right 
now, but even at that level the logic is still fairly subtle because row groups 
can span discontiguous byte ranges. 
   
   The core difficulty is that deciding whether a row group can be released 
depends on queued row groups, the remaining selection, the running offset/limit 
budget, and whether predicates require the decoder to stay conservative. Today 
that state is split across multiple components, which makes the release policy 
hard to centralize cleanly.
   
   So, rather than keep layering that complexity directly into this PR, I 
started a separate refactor in #9804 to simplify that part of the decoder 
first. Once that work has been reviewed and merged, I plan to come back to this 
PR on top of it, which should make the buffer-release changes much simpler and 
easier to reason about.
   
   Let me know if that plan sounds reasonable!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to