zhuqi-lucas commented on PR #7537:
URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2906813523

   > > And the default is selector because i use it to compute 
avg_size_of_selector.
   > 
   > Make sense -- thank you
   > 
   > I found 
[`SlicesIterator`](https://docs.rs/arrow/latest/arrow/compute/struct.SlicesIterator.html)
 when looking at the Bitmap --> RowSelection code the other day. I think that 
could be used to determine the "average run length" so we could continue to use 
`skip/select` for large contiguous runs but switch to bitmap when there are 
smaller
   > 
   > The other thing I couldn't easily work out was if there was any way to 
switch from `select/skip` _within_ a output batch, or if the plan needs to be 
either `RowSelector` or `BitMap` for each output batch
   > 
   > Or maybe we could just add a third type of `ReadPlan`, namely 
`ReadPlan::Bitmap` 🤔
   
   Thank you @alamb , this is very good point:
   
   1. I was testing for output batch, we both use either `RowSelector` or 
`BitMap` for each output batch:
   
   Because, it may happen 8192 => bitmap, 8192 => selector, 8192 => bitmap...
   
   We can't use only one to make it optimize. 
   
   2. I think the best optimize way is :
   
   - We have the basic default window size for adaptive batch size 8192, just 
like above case we setting bitmap/selector for batch size.
   - But we also support merging window for the same type batch window:
   
   For example, we have a output batch, after selecting 5 batch size:
   
   1) 8192 => bitmap 
   2) 8192 => bitmap 
   3) 8192 => selector
   4) 8192 => selector 
   5) 8192 => bitmap
   
   We can merge 1, 2 because they are all bitmap.
   We can merge 4,5 because they are all selectors.
   And remaining one bitmap
   
   
   But i think we can start from the basic optimization, only use batch size 
window to make the decision to choose bitmap or selector. And later, we can 
optimize further.
   
   
   Maybe we can only have selector for ReadPlan, but for adaptive window 
size(currently fixed with batch size), we can change to bitmap if it's dense 
for the first step...
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to