zhuqi-lucas commented on PR #7537:
URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2907789055

   > > But i think we can start from the basic optimization, only use batch 
size window to make the decision to choose bitmap or selector. And later, we 
can optimize further.
   > 
   > This is an interesting idea and I think it is worth explroing
   > 
   > > Maybe we can only have selector for ReadPlan, but for adaptive window 
size(currently fixed with batch size), we can change to bitmap if it's dense 
for the first step...
   > 
   > 👍
   > 
   > Another thing that makes this tricky in my mind is that if `batch_size` is 
`8000` that requires the total number of `1`s in the mask needs to be `8000` -- 
the mask itself can be substantially larger (e.g. it could be `16000` and 
select every other row) 🤔
   
   Very good point! @alamb It's hard for us to reduce it's overhead, maybe we 
can setting something like max_bitmap_iterator:
   
   When bitmap iterator hit > max_bitmap_iterator, we can consume it first as a 
output batch, and then to merge those batch finally. But i am not sure if it 
will make the performance worse than using selector.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to