zhuqi-lucas commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3052805272
> > The adaptive selection will help Q30 and Q31 from previous PR result: > > [apache/arrow-rs#7454 (comment)](https://github.com/apache/arrow-rs/pull/7454#issuecomment-2840770864) > > @zhuqi-lucas my largest concern about all the previous adaptive row selection work was the software engineering / keeping the complexity under control. > > I have been dreaming about this -- I am thinking it might be time to create an internal RowSelection representation like > > ```rust > enum InternalRowSelector { > /// Skip the next n rows > Skip(usize), > /// Decode and output the next n rows > Decode(usize), > /// Decode the next filter.len() rows and apply the filter before outputting the rows > DecodeAndFilter(Arc<BooleanArray>), > } > ``` > > Then the actual RecordReader would take a Vec` or something > > To stage / keep things of reasonable complexity we could make one PR that refactors to > > ```rust > enum InternalRowSelector { > /// Skip the next n rows > Skip(usize), > /// Decode and output the next n rows > Decode(usize), > // NO DecodeAndFilter variant > } > ``` > > And then add the DecodeAndFilter as a follow on PR > > Maybe I'll try and make a PR Thank you @alamb , i agree, even the adaptive selection ratio is experimented by random selection in my PR, i can't find a good way to do it clearly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org