zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2906813523
> > And the default is selector because i use it to compute avg_size_of_selector. > > Make sense -- thank you > > I found [`SlicesIterator`](https://docs.rs/arrow/latest/arrow/compute/struct.SlicesIterator.html) when looking at the Bitmap --> RowSelection code the other day. I think that could be used to determine the "average run length" so we could continue to use `skip/select` for large contiguous runs but switch to bitmap when there are smaller > > The other thing I couldn't easily work out was if there was any way to switch from `select/skip` _within_ a output batch, or if the plan needs to be either `RowSelector` or `BitMap` for each output batch > > Or maybe we could just add a third type of `ReadPlan`, namely `ReadPlan::Bitmap` 🤔 Thank you @alamb , this is very good point: 1. I was testing for output batch, we both use either `RowSelector` or `BitMap` for each output batch: Because, it may happen 8192 => bitmap, 8192 => selector, 8192 => bitmap... We can't use only one to make it optimize. 2. I think the best optimize way is : - We have the basic default window size for adaptive batch size 8192, just like above case we setting bitmap/selector for batch size. - But we also support merging window for the same type batch window: For example, we have a output batch, after selecting 5 batch size: 1) 8192 => bitmap 2) 8192 => bitmap 3) 8192 => selector 4) 8192 => selector 5) 8192 => bitmap We can merge 1, 2 because they are all bitmap. We can merge 4,5 because they are all selectors. And remaining one bitmap But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further. Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org