Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

via GitHub Wed, 09 Jul 2025 07:04:00 -0700


zhuqi-lucas commented on PR #16711:
URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3052805272


   > > The adaptive selection will help Q30 and Q31 from previous PR result:
   > > [apache/arrow-rs#7454 
(comment)](https://github.com/apache/arrow-rs/pull/7454#issuecomment-2840770864)
   > 
   > @zhuqi-lucas my largest concern about all the previous adaptive row 
selection work was the software engineering / keeping the complexity under 
control.
   > 
   > I have been dreaming about this -- I am thinking it might be time to 
create an internal RowSelection representation like
   > 
   > ```rust
   > enum InternalRowSelector {
   >   /// Skip the next n rows
   >   Skip(usize),
   >   /// Decode and output the next n rows
   >   Decode(usize),
   >    /// Decode the next filter.len() rows and apply the filter before 
outputting the rows
   >   DecodeAndFilter(Arc<BooleanArray>),
   > }
   > ```
   > 
   > Then the actual RecordReader would take a Vec` or something
   > 
   > To stage / keep things of reasonable complexity we could make one PR that 
refactors to
   > 
   > ```rust
   > enum InternalRowSelector {
   >   /// Skip the next n rows
   >   Skip(usize),
   >   /// Decode and output the next n rows
   >   Decode(usize),
   >    // NO DecodeAndFilter variant
   > }
   > ```
   > 
   > And then add the DecodeAndFilter as a follow on PR
   > 
   > Maybe I'll try and make a PR
   
   Thank you @alamb , i agree, even the adaptive selection ratio is 
experimented by random selection in my PR, i can't find a good way to do it 
clearly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

Reply via email to